Open-Meteo maintains an API for historical weather that allows for non-commercial usage of historical weather data maintained by the website.
This file builds on _v001, _v002, _v003, and _v004 to run exploratory analysis on some historical weather data.
The exploration process uses tidyverse, ranger, several generic custom functions, and several functions specific to Open Meteo processing. First, tidyverse, ranger, and the generic functions are loaded:
library(tidyverse) # tidyverse functionality is included throughout
## Warning: package 'ggplot2' was built under R version 4.2.3
## Warning: package 'tibble' was built under R version 4.2.3
## Warning: package 'purrr' was built under R version 4.2.3
## Warning: package 'dplyr' was built under R version 4.2.3
## Warning: package 'stringr' was built under R version 4.2.3
## Warning: package 'lubridate' was built under R version 4.2.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ranger) # predict() does not work on ranger objects unless ranger has been called
## Warning: package 'ranger' was built under R version 4.2.3
source("./Generic_Added_Utility_Functions_202105_v001.R") # Basic functions
Next, specific functions written in _v001, _v002, _v003, and _v004 are sourced:
source("./SimpleOneVar_Functions_202411_v001.R") # Functions for basic single variable analysis
source("./OpenMeteo_NextBest_202411_v001.R") # Functions for finding 'next best' predictor given existing model
source("./OpenMeteo_Functions_202411_v001.R") # Core functions for loading, processing, analysis of Open Meteo
Key mapping tables for available metrics are also copied:
hourlyMetrics <- "temperature_2m,relativehumidity_2m,dewpoint_2m,apparent_temperature,pressure_msl,surface_pressure,precipitation,rain,snowfall,cloudcover,cloudcover_low,cloudcover_mid,cloudcover_high,shortwave_radiation,direct_radiation,direct_normal_irradiance,diffuse_radiation,windspeed_10m,windspeed_100m,winddirection_10m,winddirection_100m,windgusts_10m,et0_fao_evapotranspiration,weathercode,vapor_pressure_deficit,soil_temperature_0_to_7cm,soil_temperature_7_to_28cm,soil_temperature_28_to_100cm,soil_temperature_100_to_255cm,soil_moisture_0_to_7cm,soil_moisture_7_to_28cm,soil_moisture_28_to_100cm,soil_moisture_100_to_255cm"
dailyMetrics <- "weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration"
hourlyDescription <- "Air temperature at 2 meters above ground\nRelative humidity at 2 meters above ground\nDew point temperature at 2 meters above ground\nApparent temperature is the perceived feels-like temperature combining wind chill factor, relative humidity and solar radiation\nAtmospheric air pressure reduced to mean sea level (msl) or pressure at surface. Typically pressure on mean sea level is used in meteorology. Surface pressure gets lower with increasing elevation.\nAtmospheric air pressure reduced to mean sea level (msl) or pressure at surface. Typically pressure on mean sea level is used in meteorology. Surface pressure gets lower with increasing elevation.\nTotal precipitation (rain, showers, snow) sum of the preceding hour. Data is stored with a 0.1 mm precision. If precipitation data is summed up to monthly sums, there might be small inconsistencies with the total precipitation amount.\nOnly liquid precipitation of the preceding hour including local showers and rain from large scale systems.\nSnowfall amount of the preceding hour in centimeters. For the water equivalent in millimeter, divide by 7. E.g. 7 cm snow = 10 mm precipitation water equivalent\nTotal cloud cover as an area fraction\nLow level clouds and fog up to 2 km altitude\nMid level clouds from 2 to 6 km altitude\nHigh level clouds from 6 km altitude\nShortwave solar radiation as average of the preceding hour. This is equal to the total global horizontal irradiation\nDirect solar radiation as average of the preceding hour on the horizontal plane and the normal plane (perpendicular to the sun)\nDirect solar radiation as average of the preceding hour on the horizontal plane and the normal plane (perpendicular to the sun)\nDiffuse solar radiation as average of the preceding hour\nWind speed at 10 or 100 meters above ground. Wind speed on 10 meters is the standard level.\nWind speed at 10 or 100 meters above ground. Wind speed on 10 meters is the standard level.\nWind direction at 10 or 100 meters above ground\nWind direction at 10 or 100 meters above ground\nGusts at 10 meters above ground of the indicated hour. Wind gusts in CERRA are defined as the maximum wind gusts of the preceding hour. Please consult the ECMWF IFS documentation for more information on how wind gusts are parameterized in weather models.\nET0 Reference Evapotranspiration of a well watered grass field. Based on FAO-56 Penman-Monteith equations ET0 is calculated from temperature, wind speed, humidity and solar radiation. Unlimited soil water is assumed. ET0 is commonly used to estimate the required irrigation for plants.\nWeather condition as a numeric code. Follow WMO weather interpretation codes. See table below for details. Weather code is calculated from cloud cover analysis, precipitation and snowfall. As barely no information about atmospheric stability is available, estimation about thunderstorms is not possible.\nVapor Pressure Deificit (VPD) in kilopascal (kPa). For high VPD (>1.6), water transpiration of plants increases. For low VPD (<0.4), transpiration decreases\nAverage temperature of different soil levels below ground.\nAverage temperature of different soil levels below ground.\nAverage temperature of different soil levels below ground.\nAverage temperature of different soil levels below ground.\nAverage soil water content as volumetric mixing ratio at 0-7, 7-28, 28-100 and 100-255 cm depths.\nAverage soil water content as volumetric mixing ratio at 0-7, 7-28, 28-100 and 100-255 cm depths.\nAverage soil water content as volumetric mixing ratio at 0-7, 7-28, 28-100 and 100-255 cm depths.\nAverage soil water content as volumetric mixing ratio at 0-7, 7-28, 28-100 and 100-255 cm depths."
dailyDescription <- "The most severe weather condition on a given day\nMaximum and minimum daily air temperature at 2 meters above ground\nMaximum and minimum daily air temperature at 2 meters above ground\nMaximum and minimum daily apparent temperature\nMaximum and minimum daily apparent temperature\nSum of daily precipitation (including rain, showers and snowfall)\nSum of daily rain\nSum of daily snowfall\nThe number of hours with rain\nSun rise and set times\nSun rise and set times\nMaximum wind speed and gusts on a day\nMaximum wind speed and gusts on a day\nDominant wind direction\nThe sum of solar radiaion on a given day in Megajoules\nDaily sum of ET0 Reference Evapotranspiration of a well watered grass field"
# Create tibble for hourly metrics
tblMetricsHourly <- tibble::tibble(metric=hourlyMetrics %>% str_split_1(","),
description=hourlyDescription %>% str_split_1("\n")
)
tblMetricsHourly %>%
print(n=50)
## # A tibble: 33 × 2
## metric description
## <chr> <chr>
## 1 temperature_2m Air temperature at 2 meters above ground
## 2 relativehumidity_2m Relative humidity at 2 meters above ground
## 3 dewpoint_2m Dew point temperature at 2 meters above ground
## 4 apparent_temperature Apparent temperature is the perceived feels-li…
## 5 pressure_msl Atmospheric air pressure reduced to mean sea l…
## 6 surface_pressure Atmospheric air pressure reduced to mean sea l…
## 7 precipitation Total precipitation (rain, showers, snow) sum …
## 8 rain Only liquid precipitation of the preceding hou…
## 9 snowfall Snowfall amount of the preceding hour in centi…
## 10 cloudcover Total cloud cover as an area fraction
## 11 cloudcover_low Low level clouds and fog up to 2 km altitude
## 12 cloudcover_mid Mid level clouds from 2 to 6 km altitude
## 13 cloudcover_high High level clouds from 6 km altitude
## 14 shortwave_radiation Shortwave solar radiation as average of the pr…
## 15 direct_radiation Direct solar radiation as average of the prece…
## 16 direct_normal_irradiance Direct solar radiation as average of the prece…
## 17 diffuse_radiation Diffuse solar radiation as average of the prec…
## 18 windspeed_10m Wind speed at 10 or 100 meters above ground. W…
## 19 windspeed_100m Wind speed at 10 or 100 meters above ground. W…
## 20 winddirection_10m Wind direction at 10 or 100 meters above ground
## 21 winddirection_100m Wind direction at 10 or 100 meters above ground
## 22 windgusts_10m Gusts at 10 meters above ground of the indicat…
## 23 et0_fao_evapotranspiration ET0 Reference Evapotranspiration of a well wat…
## 24 weathercode Weather condition as a numeric code. Follow WM…
## 25 vapor_pressure_deficit Vapor Pressure Deificit (VPD) in kilopascal (k…
## 26 soil_temperature_0_to_7cm Average temperature of different soil levels b…
## 27 soil_temperature_7_to_28cm Average temperature of different soil levels b…
## 28 soil_temperature_28_to_100cm Average temperature of different soil levels b…
## 29 soil_temperature_100_to_255cm Average temperature of different soil levels b…
## 30 soil_moisture_0_to_7cm Average soil water content as volumetric mixin…
## 31 soil_moisture_7_to_28cm Average soil water content as volumetric mixin…
## 32 soil_moisture_28_to_100cm Average soil water content as volumetric mixin…
## 33 soil_moisture_100_to_255cm Average soil water content as volumetric mixin…
# Create tibble for daily metrics
tblMetricsDaily <- tibble::tibble(metric=dailyMetrics %>% str_split_1(","),
description=dailyDescription %>% str_split_1("\n")
)
tblMetricsDaily
## # A tibble: 16 × 2
## metric description
## <chr> <chr>
## 1 weathercode The most severe weather condition on a given day
## 2 temperature_2m_max Maximum and minimum daily air temperature at 2 me…
## 3 temperature_2m_min Maximum and minimum daily air temperature at 2 me…
## 4 apparent_temperature_max Maximum and minimum daily apparent temperature
## 5 apparent_temperature_min Maximum and minimum daily apparent temperature
## 6 precipitation_sum Sum of daily precipitation (including rain, showe…
## 7 rain_sum Sum of daily rain
## 8 snowfall_sum Sum of daily snowfall
## 9 precipitation_hours The number of hours with rain
## 10 sunrise Sun rise and set times
## 11 sunset Sun rise and set times
## 12 windspeed_10m_max Maximum wind speed and gusts on a day
## 13 windgusts_10m_max Maximum wind speed and gusts on a day
## 14 winddirection_10m_dominant Dominant wind direction
## 15 shortwave_radiation_sum The sum of solar radiaion on a given day in Megaj…
## 16 et0_fao_evapotranspiration Daily sum of ET0 Reference Evapotranspiration of …
A previously existing dataset is loaded, with key analysis variables defined in a vector:
# Load previous data
allCity <- readFromRDS("allCity_20241116")
# Get core training variables
varsTrain <- getVarsTrain(allCity)
varsTrain
## [1] "hour" "temperature_2m"
## [3] "relativehumidity_2m" "dewpoint_2m"
## [5] "apparent_temperature" "pressure_msl"
## [7] "surface_pressure" "precipitation"
## [9] "rain" "snowfall"
## [11] "cloudcover" "cloudcover_low"
## [13] "cloudcover_mid" "cloudcover_high"
## [15] "shortwave_radiation" "direct_radiation"
## [17] "direct_normal_irradiance" "diffuse_radiation"
## [19] "windspeed_10m" "windspeed_100m"
## [21] "winddirection_10m" "winddirection_100m"
## [23] "windgusts_10m" "et0_fao_evapotranspiration"
## [25] "weathercode" "vapor_pressure_deficit"
## [27] "soil_temperature_0_to_7cm" "soil_temperature_7_to_28cm"
## [29] "soil_temperature_28_to_100cm" "soil_temperature_100_to_255cm"
## [31] "soil_moisture_0_to_7cm" "soil_moisture_7_to_28cm"
## [33] "soil_moisture_28_to_100cm" "soil_moisture_100_to_255cm"
## [35] "year" "doy"
# Assign default label
keyLabel <- genericKeyLabelOM()
keyLabel
## [1] "predictions based on pre-2022 training data applied to 2022 holdout dataset"
The correlation heatmap is reproduced, with functions that borrowing from the recipe at STHDA:
# Default function
corVarsTrain <- makeHeatMap(allCity, vecSelect=varsTrain, returnData=TRUE)
corVarsTrain %>% filter(Var1!=Var2) %>% arrange(desc(value)) %>% print(n=20)
## # A tibble: 630 × 3
## Var1 Var2 value
## <fct> <fct> <dbl>
## 1 rain precipitation 0.989
## 2 apparent_temperature temperature_2m 0.984
## 3 direct_radiation shortwave_radiation 0.974
## 4 temperature_2m soil_temperature_0_to_7cm 0.962
## 5 soil_temperature_28_to_100cm soil_temperature_7_to_28cm 0.952
## 6 shortwave_radiation et0_fao_evapotranspiration 0.944
## 7 apparent_temperature soil_temperature_0_to_7cm 0.942
## 8 soil_moisture_7_to_28cm soil_moisture_0_to_7cm 0.940
## 9 windspeed_100m windspeed_10m 0.940
## 10 direct_radiation et0_fao_evapotranspiration 0.928
## 11 direct_radiation direct_normal_irradiance 0.922
## 12 soil_moisture_28_to_100cm soil_moisture_7_to_28cm 0.922
## 13 soil_temperature_0_to_7cm soil_temperature_7_to_28cm 0.919
## 14 apparent_temperature soil_temperature_7_to_28cm 0.918
## 15 temperature_2m soil_temperature_7_to_28cm 0.917
## 16 shortwave_radiation direct_normal_irradiance 0.901
## 17 windspeed_10m windgusts_10m 0.884
## 18 soil_temperature_100_to_255cm soil_temperature_28_to_100cm 0.863
## 19 soil_moisture_100_to_255cm soil_moisture_28_to_100cm 0.859
## 20 apparent_temperature soil_temperature_28_to_100cm 0.853
## # ℹ 610 more rows
The correlation heatmap is produced for a single city:
corVarsTrainBOS <- makeHeatMap(allCity %>% filter(src=="Boston"), vecSelect=varsTrain, returnData=TRUE)
corVarsTrainBOS %>% filter(Var1!=Var2) %>% arrange(desc(value)) %>% print(n=20)
## # A tibble: 630 × 3
## Var1 Var2 value
## <fct> <fct> <dbl>
## 1 surface_pressure pressure_msl 1.00
## 2 apparent_temperature temperature_2m 0.993
## 3 temperature_2m soil_temperature_0_to_7cm 0.965
## 4 apparent_temperature soil_temperature_0_to_7cm 0.965
## 5 soil_temperature_0_to_7cm soil_temperature_7_to_28cm 0.965
## 6 direct_radiation shortwave_radiation 0.962
## 7 rain precipitation 0.959
## 8 soil_temperature_28_to_100cm soil_temperature_7_to_28cm 0.950
## 9 windspeed_100m windspeed_10m 0.946
## 10 apparent_temperature dewpoint_2m 0.937
## 11 shortwave_radiation et0_fao_evapotranspiration 0.935
## 12 windspeed_10m windgusts_10m 0.927
## 13 apparent_temperature soil_temperature_7_to_28cm 0.925
## 14 temperature_2m dewpoint_2m 0.917
## 15 direct_radiation direct_normal_irradiance 0.916
## 16 soil_moisture_7_to_28cm soil_moisture_0_to_7cm 0.915
## 17 temperature_2m soil_temperature_7_to_28cm 0.914
## 18 direct_radiation et0_fao_evapotranspiration 0.902
## 19 winddirection_100m winddirection_10m 0.895
## 20 soil_temperature_0_to_7cm soil_temperature_28_to_100cm 0.886
## # ℹ 610 more rows
Some variables, such as surface pressure vs. MSL pressure, are much differently correlated if controlling for city. Differences are explored:
tstCorDelta <- corVarsTrain %>%
mutate(v1=pmin(as.character(Var1), as.character(Var2)),
v2=pmax(as.character(Var1), as.character(Var2))
) %>%
select(v1, v2, value_all=value) %>%
full_join(corVarsTrainBOS %>%
mutate(v1=pmin(as.character(Var1), as.character(Var2)),
v2=pmax(as.character(Var1), as.character(Var2))
) %>%
select(v1, v2, value_bos=value),
by=c("v1", "v2")
) %>%
mutate(delta=value_bos-value_all)
tstCorDelta
## # A tibble: 666 × 5
## v1 v2 value_all value_bos delta
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 dewpoint_2m dewpoint_2m 1 1 0
## 2 dewpoint_2m soil_temperature_7_to_28cm 0.675 0.874 0.199
## 3 dewpoint_2m soil_temperature_28_to_100cm 0.630 0.807 0.177
## 4 dewpoint_2m soil_temperature_0_to_7cm 0.623 0.875 0.252
## 5 dewpoint_2m temperature_2m 0.691 0.917 0.226
## 6 apparent_temperature dewpoint_2m 0.785 0.937 0.152
## 7 dewpoint_2m soil_temperature_100_to_255cm 0.448 0.505 0.0567
## 8 dewpoint_2m doy 0.185 0.312 0.127
## 9 dewpoint_2m hour 0.00764 0.0203 0.0127
## 10 dewpoint_2m vapor_pressure_deficit -0.0678 0.305 0.373
## # ℹ 656 more rows
tstCorDelta %>% arrange(delta) %>% head(20)
## # A tibble: 20 × 5
## v1 v2 value_all value_bos delta
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 relativehumidity_2m surface_pressure 0.578 -0.160 -0.738
## 2 soil_moisture_0_to_7cm surface_pressure 0.594 -0.0914 -0.685
## 3 soil_moisture_7_to_28cm surface_pressure 0.586 -0.0240 -0.610
## 4 dewpoint_2m surface_pressure 0.364 -0.237 -0.601
## 5 soil_moisture_100_to_255cm surface_pressure 0.602 0.00609 -0.596
## 6 soil_moisture_28_to_100cm surface_pressure 0.588 0.0215 -0.566
## 7 surface_pressure windspeed_100m 0.247 -0.311 -0.558
## 8 dewpoint_2m soil_moisture_28_to_10… 0.0416 -0.508 -0.549
## 9 surface_pressure windspeed_10m 0.195 -0.304 -0.498
## 10 dewpoint_2m soil_moisture_7_to_28cm 0.0435 -0.448 -0.491
## 11 surface_pressure windgusts_10m 0.133 -0.332 -0.465
## 12 dewpoint_2m soil_moisture_0_to_7cm 0.0722 -0.393 -0.465
## 13 relativehumidity_2m soil_moisture_28_to_10… 0.401 -0.0532 -0.454
## 14 relativehumidity_2m soil_moisture_100_to_2… 0.374 -0.0545 -0.429
## 15 cloudcover surface_pressure 0.279 -0.144 -0.423
## 16 relativehumidity_2m soil_moisture_7_to_28cm 0.441 0.0357 -0.406
## 17 soil_moisture_0_to_7cm soil_moisture_100_to_2… 0.796 0.402 -0.393
## 18 cloudcover_low surface_pressure 0.197 -0.186 -0.383
## 19 surface_pressure weathercode 0.140 -0.240 -0.380
## 20 cloudcover_mid surface_pressure 0.142 -0.221 -0.362
tstCorDelta %>% arrange(desc(delta)) %>% head(20)
## # A tibble: 20 × 5
## v1 v2 value_all value_bos delta
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 soil_moisture_100_to_255cm year 0.0387 0.739 0.701
## 2 pressure_msl surface_pressure 0.401 1.00 0.599
## 3 soil_moisture_28_to_100cm year -0.0677 0.386 0.454
## 4 soil_moisture_7_to_28cm year -0.0426 0.406 0.449
## 5 surface_pressure vapor_pressure_deficit -0.493 -0.0499 0.443
## 6 dewpoint_2m vapor_pressure_deficit -0.0678 0.305 0.373
## 7 soil_moisture_100_to_255cm vapor_pressure_deficit -0.337 -0.00259 0.334
## 8 soil_moisture_0_to_7cm year 0.0110 0.334 0.323
## 9 doy soil_temperature_100_to… 0.453 0.764 0.311
## 10 relativehumidity_2m soil_temperature_0_to_7… -0.253 0.00738 0.260
## 11 dewpoint_2m soil_temperature_0_to_7… 0.623 0.875 0.252
## 12 relativehumidity_2m soil_temperature_28_to_… -0.133 0.116 0.249
## 13 relativehumidity_2m temperature_2m -0.213 0.0356 0.249
## 14 relativehumidity_2m soil_temperature_7_to_2… -0.139 0.109 0.247
## 15 relativehumidity_2m soil_temperature_100_to… -0.120 0.123 0.243
## 16 pressure_msl vapor_pressure_deficit -0.292 -0.0521 0.240
## 17 dewpoint_2m temperature_2m 0.691 0.917 0.226
## 18 cloudcover temperature_2m -0.162 0.0534 0.215
## 19 direct_normal_irradiance surface_pressure -0.118 0.0838 0.201
## 20 dewpoint_2m soil_temperature_7_to_2… 0.675 0.874 0.199
The process is repeated, using an aggregation of correlations in each of the seven cities:
keyCities <- allCity %>% pull(src) %>% unique()
keyCities
## [1] "NYC" "LA" "Chicago" "Houston" "Vegas" "Miami" "Boston"
corVarsTrainEach <- keyCities %>%
map_dfr(.f=function(x) makeHeatMap(allCity %>% filter(src==x),
vecSelect=varsTrain[!varsTrain %in% ifelse(x=="Miami", "snowfall", "")],
returnData=TRUE,
plotMap=FALSE
) %>%
mutate(v1=pmin(as.character(Var1), as.character(Var2)),
v2=pmax(as.character(Var1), as.character(Var2))
) %>%
select(v1, v2, value), .id="src") %>%
mutate(city=keyCities[as.integer(src)]) %>%
select(src, city, everything())
corVarsTrainEach
## # A tibble: 4,626 × 5
## src city v1 v2 value
## <chr> <chr> <chr> <chr> <dbl>
## 1 1 NYC soil_temperature_7_to_28cm soil_temperature_7_to_28cm 1
## 2 1 NYC soil_temperature_28_to_100cm soil_temperature_7_to_28cm 0.952
## 3 1 NYC dewpoint_2m soil_temperature_7_to_28cm 0.883
## 4 1 NYC soil_temperature_0_to_7cm soil_temperature_7_to_28cm 0.954
## 5 1 NYC soil_temperature_7_to_28cm temperature_2m 0.932
## 6 1 NYC apparent_temperature soil_temperature_7_to_28cm 0.937
## 7 1 NYC soil_temperature_100_to_255cm soil_temperature_7_to_28cm 0.584
## 8 1 NYC doy soil_temperature_7_to_28cm 0.360
## 9 1 NYC cloudcover_mid soil_temperature_7_to_28cm -0.100
## 10 1 NYC cloudcover_high soil_temperature_7_to_28cm 0.0134
## # ℹ 4,616 more rows
corVarsTrainEach %>%
arrange(desc(value)) %>%
filter(v1!=v2) %>%
slice(1:15, (nrow(.)-14):nrow(.)) %>%
print(n=30)
## # A tibble: 30 × 5
## src city v1 v2 value
## <chr> <chr> <chr> <chr> <dbl>
## 1 6 Miami precipitation rain 1
## 2 6 Miami pressure_msl surface_pressure 1.00
## 3 7 Boston pressure_msl surface_pressure 1.00
## 4 2 LA precipitation rain 1.00
## 5 4 Houston pressure_msl surface_pressure 1.00
## 6 4 Houston precipitation rain 1.00
## 7 1 NYC pressure_msl surface_pressure 1.00
## 8 5 Vegas precipitation rain 0.999
## 9 3 Chicago pressure_msl surface_pressure 0.994
## 10 1 NYC apparent_temperature temperature_2m 0.993
## 11 3 Chicago apparent_temperature temperature_2m 0.993
## 12 7 Boston apparent_temperature temperature_2m 0.993
## 13 5 Vegas direct_radiation shortwave_radiation 0.989
## 14 5 Vegas apparent_temperature temperature_2m 0.988
## 15 3 Chicago precipitation rain 0.988
## 16 1 NYC soil_moisture_0_to_7cm soil_temperature_7_to_28cm -0.680
## 17 1 NYC soil_moisture_7_to_28cm soil_temperature_0_to_7cm -0.681
## 18 1 NYC soil_moisture_100_to_255cm soil_temperature_100_to_255cm -0.687
## 19 1 NYC soil_moisture_28_to_100cm soil_temperature_100_to_255cm -0.688
## 20 4 Houston et0_fao_evapotranspiration relativehumidity_2m -0.689
## 21 1 NYC soil_moisture_28_to_100cm soil_temperature_7_to_28cm -0.707
## 22 1 NYC soil_moisture_7_to_28cm soil_temperature_28_to_100cm -0.715
## 23 1 NYC soil_moisture_7_to_28cm soil_temperature_7_to_28cm -0.720
## 24 3 Chicago soil_moisture_100_to_255cm year -0.758
## 25 1 NYC soil_moisture_28_to_100cm soil_temperature_28_to_100cm -0.777
## 26 4 Houston relativehumidity_2m vapor_pressure_deficit -0.791
## 27 2 LA relativehumidity_2m vapor_pressure_deficit -0.818
## 28 5 Vegas soil_moisture_28_to_100cm year -0.828
## 29 5 Vegas soil_moisture_7_to_28cm year -0.831
## 30 6 Miami relativehumidity_2m vapor_pressure_deficit -0.868
Comparison of aggregated individual city correlations to the overall correlations:
corVarsTrainDelta <- corVarsTrain %>%
mutate(v1=pmin(as.character(Var1), as.character(Var2)),
v2=pmax(as.character(Var1), as.character(Var2))
) %>%
select(v1, v2, value_all=value) %>%
full_join(corVarsTrainEach %>%
group_by(v1, v2) %>%
summarize(value=mean(value), .groups="drop"),
by=c("v1", "v2")) %>%
mutate(delta=value-value_all)
corVarsTrainDelta %>%
arrange(delta) %>%
slice(1:15, (nrow(.)-14):nrow(.)) %>%
print(n=30)
## # A tibble: 30 × 5
## v1 v2 value_all value delta
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 relativehumidity_2m surface_pressure 0.578 -0.178 -0.756
## 2 dewpoint_2m surface_pressure 0.364 -0.371 -0.735
## 3 soil_moisture_100_to_255cm soil_moisture_7_to_28cm 0.824 0.156 -0.668
## 4 soil_moisture_0_to_7cm soil_moisture_100_to_25… 0.796 0.138 -0.658
## 5 soil_moisture_0_to_7cm surface_pressure 0.594 -0.0212 -0.615
## 6 soil_moisture_100_to_255cm surface_pressure 0.602 -0.00359 -0.606
## 7 soil_moisture_100_to_255cm soil_moisture_28_to_100… 0.859 0.283 -0.576
## 8 soil_moisture_7_to_28cm surface_pressure 0.586 0.0143 -0.572
## 9 soil_moisture_28_to_100cm surface_pressure 0.588 0.0277 -0.560
## 10 cloudcover surface_pressure 0.279 -0.138 -0.417
## 11 surface_pressure windspeed_100m 0.247 -0.165 -0.412
## 12 relativehumidity_2m soil_moisture_100_to_25… 0.374 -0.00330 -0.378
## 13 soil_moisture_0_to_7cm soil_moisture_28_to_100… 0.842 0.491 -0.351
## 14 surface_pressure windspeed_10m 0.195 -0.149 -0.344
## 15 relativehumidity_2m soil_moisture_28_to_100… 0.401 0.0674 -0.333
## 16 vapor_pressure_deficit windspeed_10m -0.0519 0.100 0.152
## 17 relativehumidity_2m soil_temperature_100_to… -0.120 0.0325 0.153
## 18 et0_fao_evapotranspiration soil_moisture_100_to_25… -0.138 0.0221 0.160
## 19 diffuse_radiation vapor_pressure_deficit 0.283 0.449 0.166
## 20 soil_moisture_28_to_100cm vapor_pressure_deficit -0.404 -0.234 0.170
## 21 soil_moisture_100_to_255cm temperature_2m -0.204 -0.0213 0.183
## 22 soil_moisture_100_to_255cm soil_temperature_0_to_7… -0.221 -0.0296 0.192
## 23 soil_moisture_100_to_255cm soil_temperature_28_to_… -0.308 -0.115 0.193
## 24 dewpoint_2m vapor_pressure_deficit -0.0678 0.136 0.203
## 25 soil_moisture_100_to_255cm soil_temperature_7_to_2… -0.257 -0.0527 0.205
## 26 direct_normal_irradiance surface_pressure -0.118 0.0875 0.205
## 27 doy soil_temperature_100_to… 0.453 0.739 0.286
## 28 soil_moisture_100_to_255cm vapor_pressure_deficit -0.337 -0.0176 0.319
## 29 surface_pressure vapor_pressure_deficit -0.493 -0.0665 0.427
## 30 pressure_msl surface_pressure 0.401 0.983 0.582
Differences by variable are explored:
corVarsTrain %>%
mutate(v1=pmin(as.character(Var1), as.character(Var2)),
v2=pmax(as.character(Var1), as.character(Var2))
) %>%
select(v1, v2, value_all=value) %>%
full_join(corVarsTrainEach, by=c("v1", "v2")) %>%
mutate(delta=value-value_all) %>%
filter(v1!=v2) %>%
select(city, v1, v2, delta) %>%
bind_rows(., ., .id="src") %>%
mutate(vrbl=case_when(src=="1"~v1, src=="2"~v2, TRUE~NA)) %>%
ggplot(aes(x=fct_reorder(vrbl, delta, .fun=function(x) diff(range(x))), y=delta)) +
geom_boxplot(fill="lightblue") +
coord_flip() +
geom_hline(yintercept=0, color="black", lty=2) +
labs(title="Correlation delta (individual minus all-city aggregate)",
x=NULL,
y="Delta (individual minus all-city aggregate)",
subtitle="Calculated for each variable combination and city"
)
Soil moisture, surface pressure, and dewpoint are among the variables with the highest changes in correlation when calculated n individual cities and the all-city aggregate
An example Simpson’s paradox is MSL pressure vs. surface pressure:
allCity %>%
count(src, surface_pressure, pressure_msl) %>%
ggplot(aes(x=surface_pressure, pressure_msl)) +
geom_smooth(aes(weight=n, color=src), method="lm") +
geom_smooth(method="lm", lty=2, aes(weight=n), color="black") +
labs(title="Relationship between MSL pressure and surface pressure",
subtitle="Dashed black line is overall relationship"
) +
scale_color_discrete(NULL)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
The process is converted to functional form:
tmpSmoothPlot <- function(df, x, y, xName=x, yName=y, printPlot=TRUE, returnPlot=!isTRUE(printPlot)) {
# FUNCTION ARGUMENTS:
# df: the data frame
# x: x variable
# y: y variable
# xName: name to describe x variable
# yName: name to describe y variable
# printPlot: boolean, should plot be printed?
# returnPlot: boolean, should plot object be returned?
p1 <- df %>%
select(src, all_of(c(x, y))) %>%
purrr::set_names(c("src", "x1", "y1")) %>%
count(src, x1, y1) %>%
ggplot(aes(x=x1, y=y1)) +
geom_smooth(aes(weight=n, color=src), method="lm") +
geom_smooth(method="lm", lty=2, aes(weight=n), color="black") +
labs(title=paste0("Relationship between ", xName, " and ", yName),
subtitle="Dashed black line is overall relationship",
y=if(y!=yName) paste0(yName, "\n(", y, ")") else y,
x=if(x!=xName) paste0(xName, "\n(", x, ")") else x
) +
scale_color_discrete(NULL)
# Print plot if requested
if(isTRUE(printPlot)) print(p1)
# Return plot if requested
if(isTRUE(returnPlot)) return(p1)
}
# Example function call
tmpSmoothPlot(allCity,
x="surface_pressure",
y="pressure_msl",
yName="MSL Pressure",
xName="Surface Pressure"
)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
The function is tested on two additional sets of metrics:
gridExtra::grid.arrange(tmpSmoothPlot(allCity %>%
mutate(across(c("apparent_temperature", "dewpoint_2m"),
.fns=function(x) round(x)
)
),
x="dewpoint_2m",
y="apparent_temperature",
xName="Dew Point",
yName="Apparent Temperature",
printPlot=FALSE
),
tmpSmoothPlot(allCity %>%
mutate(across(c("surface_pressure", "dewpoint_2m"),
.fns=function(x) round(x)
)
),
x="dewpoint_2m",
y="surface_pressure",
xName="Dew Point",
yName="Surface Pressure",
printPlot=FALSE
),
nrow=1
)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
The function is updated to allow for rounding:
tmpSmoothPlot <- function(df,
x,
y,
xRound=NULL,
yRound=NULL,
xName=x,
yName=y,
printPlot=TRUE,
returnPlot=!isTRUE(printPlot)
) {
# FUNCTION ARGUMENTS:
# df: the data frame
# x: x variable
# y: y variable
# {x,y}Round: rounding to apply to vector {x,y} using function autoRound()
# NULL means no rounding (default)
# -1L means make an estimate based on data (around 100 buckets created)
# a positive float or integer means round everything to the nearest multiple
# xName: name to describe x variable
# yName: name to describe y variable
# printPlot: boolean, should plot be printed?
# returnPlot: boolean, should plot object be returned?
p1 <- df %>%
select(src, all_of(c(x, y))) %>%
purrr::set_names(c("src", "x1", "y1")) %>%
mutate(x1=autoRound(x1, rndTo=xRound),
y1=autoRound(y1, rndTo=yRound)
) %>%
count(src, x1, y1) %>%
ggplot(aes(x=x1, y=y1)) +
geom_smooth(aes(weight=n, color=src), method="lm") +
geom_smooth(method="lm", lty=2, aes(weight=n), color="black") +
labs(title=paste0("Relationship between ", xName, " and ", yName),
subtitle="Dashed black line is overall relationship",
y=if(y!=yName) paste0(yName, "\n(", y, ")") else y,
x=if(x!=xName) paste0(xName, "\n(", x, ")") else x
) +
scale_color_discrete(NULL)
# Print plot if requested
if(isTRUE(printPlot)) print(p1)
# Return plot if requested
if(isTRUE(returnPlot)) return(p1)
}
The updated function is tested on elements with many buckets:
# Example function call (no rounding)
t0 <- Sys.time()
system.time(
tmpSmoothPlot(allCity,
x="direct_normal_irradiance",
y="shortwave_radiation",
xName="Direct Solar Radiation",
yName="Shortwave Solar Radiation"
)
)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## user system elapsed
## 3.55 0.32 3.91
Sys.time() - t0
## Time difference of 4.13413 secs
# Example function call (rounding)
t0 <- Sys.time()
system.time(
tmpSmoothPlot(allCity,
x="direct_normal_irradiance",
y="shortwave_radiation",
xRound=-1L,
yRound=-1L,
xName="Direct Solar Radiation",
yName="Shortwave Solar Radiation"
)
)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## user system elapsed
## 0.75 0.11 0.85
Sys.time() - t0
## Time difference of 1.08295 secs
A simple model is run to predict surface pressure as a function of dewpoint, without considering location:
# Create model
tstLM <- allCity %>%
select(dewpoint_2m, surface_pressure) %>%
lm(surface_pressure ~ dewpoint_2m, data=.)
# Summary
summary(tstLM)
##
## Call:
## lm(formula = surface_pressure ~ dewpoint_2m, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -82.042 -15.314 5.729 17.119 83.092
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.871e+02 3.631e-02 27182.2 <2e-16 ***
## dewpoint_2m 9.507e-01 2.632e-03 361.2 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 26.72 on 854206 degrees of freedom
## Multiple R-squared: 0.1325, Adjusted R-squared: 0.1325
## F-statistic: 1.305e+05 on 1 and 854206 DF, p-value: < 2.2e-16
# Review of predictions
allCity %>%
select(src, dewpoint_2m, surface_pressure) %>%
mutate(pred=predict(tstLM, newdata=.)) %>%
mutate(across(c(surface_pressure, pred), .fns=function(x) autoRound(x))) %>%
count(surface_pressure, pred) %>%
ggplot(aes(x=pred, y=surface_pressure)) +
geom_point(aes(size=n), alpha=0.25) +
geom_smooth(aes(weight=n), method="lm") +
geom_abline(slope=1, intercept=0, lty=2, color="red") +
labs(x="Prediction",
y="Actual",
title="Predictions for Surface Pressure",
subtitle="Surface Pressure ~ Dewpoint"
) +
scale_size_continuous(NULL)
## `geom_smooth()` using formula = 'y ~ x'
The model is updated to predict surface pressure as a function of dewpoint, considering location:
# Create model
tstLM_002 <- allCity %>%
select(dewpoint_2m, surface_pressure, src) %>%
lm(surface_pressure ~ dewpoint_2m:src + src, data=.)
# Summary
summary(tstLM_002)
##
## Call:
## lm(formula = surface_pressure ~ dewpoint_2m:src + src, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -48.856 -2.901 0.233 3.277 28.618
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.017e+03 1.844e-02 55129.10 <2e-16 ***
## srcChicago -2.021e+01 2.594e-02 -779.18 <2e-16 ***
## srcHouston 4.918e+00 4.032e-02 121.98 <2e-16 ***
## srcLA -3.904e+01 2.910e-02 -1341.57 <2e-16 ***
## srcMiami 5.985e+00 7.381e-02 81.09 <2e-16 ***
## srcNYC -2.917e+00 2.700e-02 -108.04 <2e-16 ***
## srcVegas -8.141e+01 2.487e-02 -3273.38 <2e-16 ***
## dewpoint_2m:srcBoston -1.907e-01 1.561e-03 -122.19 <2e-16 ***
## dewpoint_2m:srcChicago -2.323e-01 1.493e-03 -155.62 <2e-16 ***
## dewpoint_2m:srcHouston -4.411e-01 2.017e-03 -218.66 <2e-16 ***
## dewpoint_2m:srcLA -2.287e-01 2.261e-03 -101.17 <2e-16 ***
## dewpoint_2m:srcMiami -3.044e-01 3.512e-03 -86.66 <2e-16 ***
## dewpoint_2m:srcNYC -1.990e-01 1.607e-03 -123.81 <2e-16 ***
## dewpoint_2m:srcVegas -1.471e-01 2.120e-03 -69.39 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.725 on 854194 degrees of freedom
## Multiple R-squared: 0.9602, Adjusted R-squared: 0.9602
## F-statistic: 1.585e+06 on 13 and 854194 DF, p-value: < 2.2e-16
# Review of predictions
allCity %>%
select(src, dewpoint_2m, surface_pressure) %>%
mutate(pred=predict(tstLM_002, newdata=.)) %>%
mutate(across(c(surface_pressure, pred), .fns=function(x) autoRound(x))) %>%
count(src, surface_pressure, pred) %>%
ggplot(aes(x=pred, y=surface_pressure)) +
geom_point(aes(size=n, color=src), alpha=0.25) +
geom_smooth(aes(weight=n), method="lm") +
geom_abline(slope=1, intercept=0, lty=2, color="red") +
labs(x="Prediction",
y="Actual",
title="Predictions for Surface Pressure",
subtitle="Surface Pressure ~ Dewpoint:City + City"
) +
scale_size_continuous(NULL) +
scale_color_discrete(NULL)
## `geom_smooth()` using formula = 'y ~ x'
Long term daily data is downloaded for Atlanta:
# Daily data download for Atlanta, GA
testURLDaily <- helperOpenMeteoURL(cityName="Atlanta GA",
dailyIndices=1:nrow(tblMetricsDaily),
startDate="1960-01-01",
endDate="2023-12-31",
tz="US/Eastern"
)
##
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
testURLDaily
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=33.76&longitude=-84.42&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FEastern"
# Download file
if(!file.exists("testOM_daily_atl.json")) {
fileDownload(fileName="testOM_daily_atl.json", url=testURLDaily)
} else {
cat("\nFile testOM_daily_atl.json already exists, skipping download\n")
}
##
## File testOM_daily_atl.json already exists, skipping download
The daily dataset is loaded:
# Read daily JSON file
atlOMDaily <- formatOpenMeteoJSON("testOM_daily_atl.json")
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## $tblDaily
## # A tibble: 23,376 × 18
## date time weathercode temperature_2m_max temperature_2m_min
## <date> <chr> <int> <dbl> <dbl>
## 1 1960-01-01 1960-01-01 71 2.6 0.1
## 2 1960-01-02 1960-01-02 61 11.4 1.5
## 3 1960-01-03 1960-01-03 63 14.7 2
## 4 1960-01-04 1960-01-04 53 7.3 -0.1
## 5 1960-01-05 1960-01-05 51 7.1 0.2
## 6 1960-01-06 1960-01-06 53 7.3 4.5
## 7 1960-01-07 1960-01-07 61 7.4 3.1
## 8 1960-01-08 1960-01-08 2 10.7 0.4
## 9 1960-01-09 1960-01-09 3 15 -0.8
## 10 1960-01-10 1960-01-10 3 17.5 4
## # ℹ 23,366 more rows
## # ℹ 13 more variables: apparent_temperature_max <dbl>,
## # apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## # snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <chr>, sunset <chr>,
## # windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## # winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## # et0_fao_evapotranspiration <dbl>
##
## $tblHourly
## NULL
##
## $tblUnits
## # A tibble: 17 × 4
## metricType name value description
## <chr> <chr> <chr> <chr>
## 1 daily_units time "iso8601" <NA>
## 2 daily_units weathercode "wmo code" The most severe weather co…
## 3 daily_units temperature_2m_max "deg C" Maximum and minimum daily …
## 4 daily_units temperature_2m_min "deg C" Maximum and minimum daily …
## 5 daily_units apparent_temperature_max "deg C" Maximum and minimum daily …
## 6 daily_units apparent_temperature_min "deg C" Maximum and minimum daily …
## 7 daily_units precipitation_sum "mm" Sum of daily precipitation…
## 8 daily_units rain_sum "mm" Sum of daily rain
## 9 daily_units snowfall_sum "cm" Sum of daily snowfall
## 10 daily_units precipitation_hours "h" The number of hours with r…
## 11 daily_units sunrise "iso8601" Sun rise and set times
## 12 daily_units sunset "iso8601" Sun rise and set times
## 13 daily_units windspeed_10m_max "km/h" Maximum wind speed and gus…
## 14 daily_units windgusts_10m_max "km/h" Maximum wind speed and gus…
## 15 daily_units winddirection_10m_dominant "deg " Dominant wind direction
## 16 daily_units shortwave_radiation_sum "MJ/m²" The sum of solar radiaion …
## 17 daily_units et0_fao_evapotranspiration "mm" Daily sum of ET0 Reference…
##
## $tblDescription
## # A tibble: 1 × 7
## latitude longitude generationtime_ms utc_offset_seconds timezone
## <dbl> <dbl> <dbl> <int> <chr>
## 1 33.8 -84.4 388. -18000 US/Eastern
## # ℹ 2 more variables: timezone_abbreviation <chr>, elevation <dbl>
##
##
## latitude: 33.77856
## longitude: -84.40299
## generationtime_ms: 388.082
## utc_offset_seconds: -18000
## timezone: US/Eastern
## timezone_abbreviation: EST
## elevation: 302
# Sample records of tibble
atlOMDaily$tblDaily %>% glimpse()
## Rows: 23,376
## Columns: 18
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <int> 71, 61, 63, 53, 51, 53, 61, 2, 3, 3, 3, 3, …
## $ temperature_2m_max <dbl> 2.6, 11.4, 14.7, 7.3, 7.1, 7.3, 7.4, 10.7, …
## $ temperature_2m_min <dbl> 0.1, 1.5, 2.0, -0.1, 0.2, 4.5, 3.1, 0.4, -0…
## $ apparent_temperature_max <dbl> -2.3, 9.2, 13.1, 2.8, 4.4, 6.1, 4.9, 6.9, 1…
## $ apparent_temperature_min <dbl> -4.0, -3.1, -2.7, -4.7, -3.3, 1.4, -0.5, -4…
## $ precipitation_sum <dbl> 3.0, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ rain_sum <dbl> 2.5, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ snowfall_sum <dbl> 0.35, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 13, 21, 5, 3, 3, 17, 8, 0, 0, 0, 0, 0, 0, 5…
## $ sunrise <chr> "1960-01-01T07:42", "1960-01-02T07:42", "19…
## $ sunset <chr> "1960-01-01T17:39", "1960-01-02T17:40", "19…
## $ windspeed_10m_max <dbl> 20.9, 18.7, 23.8, 15.5, 9.2, 11.4, 19.2, 17…
## $ windgusts_10m_max <dbl> 39.6, 41.4, 56.2, 44.6, 33.1, 41.8, 39.2, 3…
## $ winddirection_10m_dominant <int> 90, 106, 295, 318, 2, 69, 288, 313, 140, 24…
## $ shortwave_radiation_sum <dbl> 2.38, 1.83, 10.18, 9.37, 3.88, 1.77, 5.34, …
## $ et0_fao_evapotranspiration <dbl> 0.57, 0.38, 1.34, 1.36, 0.65, 0.33, 0.76, 1…
Variables are converted to proper data type:
dfDailyATL <- atlOMDaily$tblDaily %>%
mutate(weathercode=factor(weathercode),
sunrise_chr=sunrise,
sunset_chr=sunset,
sunrise=lubridate::ymd_hm(sunrise),
sunset=lubridate::ymd_hm(sunset),
fct_winddir=factor(winddirection_10m_dominant)
)
glimpse(dfDailyATL)
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 71, 61, 63, 53, 51, 53, 61, 2, 3, 3, 3, 3, …
## $ temperature_2m_max <dbl> 2.6, 11.4, 14.7, 7.3, 7.1, 7.3, 7.4, 10.7, …
## $ temperature_2m_min <dbl> 0.1, 1.5, 2.0, -0.1, 0.2, 4.5, 3.1, 0.4, -0…
## $ apparent_temperature_max <dbl> -2.3, 9.2, 13.1, 2.8, 4.4, 6.1, 4.9, 6.9, 1…
## $ apparent_temperature_min <dbl> -4.0, -3.1, -2.7, -4.7, -3.3, 1.4, -0.5, -4…
## $ precipitation_sum <dbl> 3.0, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ rain_sum <dbl> 2.5, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ snowfall_sum <dbl> 0.35, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 13, 21, 5, 3, 3, 17, 8, 0, 0, 0, 0, 0, 0, 5…
## $ sunrise <dttm> 1960-01-01 07:42:00, 1960-01-02 07:42:00, …
## $ sunset <dttm> 1960-01-01 17:39:00, 1960-01-02 17:40:00, …
## $ windspeed_10m_max <dbl> 20.9, 18.7, 23.8, 15.5, 9.2, 11.4, 19.2, 17…
## $ windgusts_10m_max <dbl> 39.6, 41.4, 56.2, 44.6, 33.1, 41.8, 39.2, 3…
## $ winddirection_10m_dominant <int> 90, 106, 295, 318, 2, 69, 288, 313, 140, 24…
## $ shortwave_radiation_sum <dbl> 2.38, 1.83, 10.18, 9.37, 3.88, 1.77, 5.34, …
## $ et0_fao_evapotranspiration <dbl> 0.57, 0.38, 1.34, 1.36, 0.65, 0.33, 0.76, 1…
## $ sunrise_chr <chr> "1960-01-01T07:42", "1960-01-02T07:42", "19…
## $ sunset_chr <chr> "1960-01-01T17:39", "1960-01-02T17:40", "19…
## $ fct_winddir <fct> 90, 106, 295, 318, 2, 69, 288, 313, 140, 24…
Averages by month for select continuous variables are plotted:
tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)",
"windspeed_10m_max"="4. Windspeed (kph)",
"temperature_2m_max"="1. High Temperature (C)",
"temperature_2m_min"="2. Low Temperature (C)"
)
dfDailyATL %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(year, month, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
group_by(year, month) %>%
summarize(across(-c(precipitation_sum), .fns=mean),
across(c(precipitation_sum), .fns=sum),
.groups="drop"
) %>%
group_by(month) %>%
summarize(across(-c(year), .fns=mean)) %>%
pivot_longer(-c(month)) %>%
ggplot(aes(x=month, y=value)) +
geom_line(aes(group=1)) +
facet_wrap(~tmpMapNames[name], scales="free_y") +
labs(x=NULL, y=NULL, title="Monthly averages for key metrics (Atlanta 1960-2023)")
Averages by month for select categorical variables are plotted:
tmpMapNames <- c("wc"="1. Weather Type",
"winddir"="2. Predominant Wind Direction"
)
tmpDFPlot <- dfDailyATL %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=315~"1. N",
winddirection_10m_dominant>=225~"2. W",
winddirection_10m_dominant>=135~"3. S",
winddirection_10m_dominant>=45~"4. E",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
),
wc=case_when(weathercode==0~"1. Clear",
weathercode %in% c(1, 2, 3)~"2. Dry",
weathercode %in% c(51, 53, 55)~"3. Drizzle",
weathercode %in% c(61, 63, 65)~"4. Rain",
weathercode %in% c(71, 73, 75)~"5. Snow",
TRUE~"Error"
)
) %>%
select(month, wc, winddir) %>%
pivot_longer(-c(month)) %>%
count(month, name, value)
tmpPlotFN <- function(x) {
p1 <- tmpDFPlot %>%
filter(name==x) %>%
ggplot(aes(x=month, y=n)) +
geom_line(aes(group=value, color=value), lwd=2) +
labs(x=NULL,
y=NULL,
title=paste0(tmpMapNames[x], " (Atlanta 1960-2023)")
) +
scale_color_discrete(NULL) +
theme(legend.position = "bottom")
return(p1)
}
gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)
Averages by year for select variables are plotted:
tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)",
"windspeed_10m_max"="4. Windspeed (kph)",
"temperature_2m_max"="1. High Temperature (C)",
"temperature_2m_min"="2. Low Temperature (C)"
)
dfDailyATL %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(year, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
group_by(year) %>%
summarize(across(-c(precipitation_sum), .fns=mean),
across(c(precipitation_sum), .fns=sum),
.groups="drop"
) %>%
pivot_longer(-c(year)) %>%
ggplot(aes(x=year, y=value)) +
geom_line(aes(group=1)) +
facet_wrap(~tmpMapNames[name], scales="free_y") +
labs(x=NULL, y=NULL, title="Annual averages for key metrics (Atlanta 1960-2023)")
Averages by year for select categorical variables are plotted:
tmpMapNames <- c("wc"="1. Weather Type",
"winddir"="2. Predominant Wind Direction"
)
tmpDFPlot <- dfDailyATL %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=315~"1. N",
winddirection_10m_dominant>=225~"2. W",
winddirection_10m_dominant>=135~"3. S",
winddirection_10m_dominant>=45~"4. E",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
),
wc=case_when(weathercode==0~"1. Clear",
weathercode %in% c(1, 2, 3)~"2. Dry",
weathercode %in% c(51, 53, 55)~"3. Drizzle",
weathercode %in% c(61, 63, 65)~"4. Rain",
weathercode %in% c(71, 73, 75)~"5. Snow",
TRUE~"Error"
)
) %>%
select(year, wc, winddir) %>%
pivot_longer(-c(year)) %>%
count(year, name, value)
tmpPlotFN <- function(x) {
p1 <- tmpDFPlot %>%
filter(name==x) %>%
ggplot(aes(x=year, y=n)) +
geom_line(aes(group=value, color=value), lwd=1) +
labs(x=NULL,
y=NULL,
title=paste0(tmpMapNames[x], " (Atlanta 1960-2023)")
) +
scale_color_discrete(NULL) +
theme(legend.position = "bottom")
return(p1)
}
gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)
Boxplots for maximum windspeed are created:
dfDailyATL %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, windspeed_10m_max) %>%
ggplot(aes(x=month, y=windspeed_10m_max)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Maximum daily wind speed (kph)",
title="Maximum daily windspeed by month (Atlanta 1960-2023)"
)
Boxplots for precipitation are created:
dfDailyATL %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, precipitation_hours) %>%
ggplot(aes(x=month, y=precipitation_hours)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily precipitation hours",
title="Daily precipitation hours by month (Atlanta 1960-2023)"
)
dfDailyATL %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, precipitation_sum) %>%
ggplot(aes(x=month, y=precipitation_sum)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily precipitation (mm)",
title="Daily precipitation by month (Atlanta 1960-2023)"
)
Boxplots for temperature are created:
dfDailyATL %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, temperature_2m_max) %>%
ggplot(aes(x=month, y=temperature_2m_max)) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y="Daily high temperature (C)",
title="Daily high temperature by month (Atlanta 1960-2023)"
)
dfDailyATL %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, temperature_2m_min) %>%
ggplot(aes(x=month, y=temperature_2m_min)) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y="Daily low temperature (C)",
title="Daily low temperature by month (Atlanta 1960-2023)"
)
ACF and PACF are explored for maximum temperature:
acfTemp <- acf(dfDailyATL$temperature_2m_max, lag.max=1000)
as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which()
## [1] 365 737
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 185 549 916
pacfTemp <- pacf(dfDailyATL$temperature_2m_max, lag.max=1000)
pacf(dfDailyATL$temperature_2m_max, lag.max=50)
As expected, ACF has a sustained seasonal pattern, with peaks (at roughly intervals of 365) and valleys (at roughly intervals of 365 offset by roughly 185) corresponding to the 365-day year
ACF and PACF are explored for precipitation:
acfPrecip <- acf(dfDailyATL$precipitation_sum, lag.max=1000)
acf(dfDailyATL$precipitation_sum, lag.max=50)
as.vector(acfPrecip$acf) %>% findPeaks(width=21) %>% which()
## [1] 28 52 78 100 126 140 157 187 202 213 250 284 311 335 361 377 406 428 442
## [20] 465 481 502 523 540 553 585 609 636 650 680 692 710 728 744 757 769 791 803
## [39] 825 852 870 897 913 930 941 961 989
as.vector(acfPrecip$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 18 56 71 97 120 134 166 178 195 210 226 238 268 288 302 320 332 344 359
## [20] 383 409 420 438 452 471 483 497 533 551 565 581 598 613 633 654 676 695 719
## [39] 743 766 799 812 842 862 902 920 948 977
pacfPrecip <- pacf(dfDailyATL$precipitation_sum, lag.max=1000)
pacf(dfDailyATL$precipitation_sum, lag.max=50)
Precipitation by contrast has little seasonality and mostly just a correlation to the previous day’s value
ACF and PACF are explored for windspeed:
acfWind <- acf(dfDailyATL$windspeed_10m_max, lag.max=1000)
acf(dfDailyATL$windspeed_10m_max, lag.max=50)
as.vector(acfWind$acf) %>% findPeaks(width=21) %>% which()
## [1] 13 172 183 338 361 378 553 738 894 921
as.vector(acfWind$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 174 185 555 730 891 910 944
pacfWind <- pacf(dfDailyATL$windspeed_10m_max, lag.max=1000)
pacf(dfDailyATL$windspeed_10m_max, lag.max=50)
Similar to temperature, wind speed appears to have a sustained seasonal component, though peak correlations are much lower in magnitude (~0.2 for wind speed vs. ~0.75 for temperature)
A boxplot for delta daily temperature is created:
dfDailyATL %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=temperature_2m_max-lag(temperature_2m_max)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in maximum temperature")
Daily temperature changes are generally larger in magnitude during the colder season
A boxplot for delta daily wind speed is created:
dfDailyATL %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=windspeed_10m_max-lag(windspeed_10m_max)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in maximum wind speed")
Daily wind speed changes are generally larger in magnitude during the colder season
A boxplot for delta daily precipitation is created:
dfDailyATL %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=precipitation_sum-lag(precipitation_sum)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in precipitation")
Most days have no precipitation. The corresponding boxplot for precipitation change has small boxes and whiskers with many outliers
Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21 <- dfDailyATL %>%
mutate(doy=pmin(yday(date), 365)) %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(date, doy, contains("_r21"))
df_r21 %>%
na.omit() %>%
group_by(doy) %>%
summarize(across(where(is.numeric), .fns=mean), n=n()) %>%
pivot_longer(cols=-c(doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=name, color=name)) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean") +
theme(legend.position="none")
Outliers are likely much more important in driving average precipitation than in driving average temperature
Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21_sd <- dfDailyATL %>%
mutate(doy=pmin(yday(date), 365)) %>%
group_by(doy) %>%
mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
ungroup() %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(date, doy, contains("_r21"))
df_r21_sd %>%
na.omit() %>%
group_by(doy) %>%
summarize(across(where(is.numeric), .fns=mean)) %>%
pivot_longer(cols=-c(doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=name, color=name)) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") +
theme(legend.position="none")
Means and standard deviations are plotted together:
df_r21 %>%
bind_rows(df_r21_sd, .id="src") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
group_by(doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(doy, musig)) %>%
pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=name, color=name), lwd=2) +
geom_ribbon(aes(ymin=Mean-SD, ymax=Mean+SD, fill=name), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y="Rolling-21 mean +/- daily standard deviation",
title="Rolling 21-day mean +/- one rolling 21-day sd"
) +
theme(legend.position="none")
Means and approximate SEM are plotted together:
nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
bind_rows(df_r21_sd, .id="src") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
group_by(doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(doy, musig)) %>%
pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=name, color=name), lwd=2) +
geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=name), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y="Rolling-21 mean +/- 1 SEM (approx)",
title="Rolling 21-day mean +/- 1 SEM (approx)"
) +
theme(legend.position="none")
Consistent with previous analyses, temperature has a strong seasonal pattern with differences in mean that vastly exceed the standard error of the mean. By contrast, while there is apparent seasonal spikiness to precipitation, rolling 21-day means are usually within one standard error of the mean of each other. Wind is more similar to temperature in having seasonal variations that meaningfully exceed SEM in magnitude.
Percentage of cumulative precipitation by volume of daily precipitation is explored:
tmpPlotData <- dfDailyATL %>%
select(date, precipitation_sum) %>%
group_by(precipitation_sum) %>%
summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
mutate(csp=cumsum(precip), cpp=csp/sum(precip), csn=cumsum(n), cpn=csn/sum(n))
tmpPlotData %>%
pivot_longer(cols=-c(precipitation_sum)) %>%
ggplot(aes(x=precipitation_sum, y=value)) +
geom_line(data=~filter(., name %in% c("cpp", "cpn")),
aes(group=name, color=c("cpn"="# events", "cpp"="precip")[name])
) +
labs(x="Daily precipitation (mm)", y="% total (cumul)") +
scale_color_discrete("Metric")
Around half of days have no precipitation. Around 25% of total precipitation comes from the rare day with 25+ mm (about an inch) of precipitation
Total precipitation by decade is also explored, sorted by daily precipitation:
dfDailyATL %>%
select(date, precipitation_sum) %>%
mutate(year=year(date), decade=round(year(date)-4.5, -1)) %>%
group_by(decade, precipitation_sum) %>%
summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
group_by(decade) %>%
mutate(csp=cumsum(precip), ntot=sum(n), meancsp=365*csp/ntot) %>%
ungroup() %>%
ggplot(aes(x=precipitation_sum, y=meancsp)) +
geom_line(aes(group=factor(decade), color=factor(decade)), lwd=1) +
labs(x="Daily precipitation (mm)",
y="Cumulative annual precipitation (mm)",
title="Average annual precipitation by decade",
subtitle="Cumulative, for daily precipitation amounts LTE x-axis"
) +
scale_color_discrete("Decade")
Trends look similar by decade for precipitation amounts under 0.5 inch (12.5 mm). Heavier precipitation amounts appear to be more prevalent in the 202s, driving overall precipitation totals ~50% higher
Precipitation appears to have broken trend starting in the late 2010s:
dfDailyATL %>%
select(date, precipitation_sum) %>%
mutate(year=year(date), decade=round(year(date)-4.5, -1)) %>%
group_by(decade, year) %>%
summarize(precip=sum(precipitation_sum), .groups="drop") %>%
ggplot(aes(x=year, y=precip)) +
geom_line(aes(color=ifelse(year>=2017, "2017-2023", "pre-2017")), lwd=1) +
geom_smooth(aes(color=ifelse(year>=2017, "2017-2023", "pre-2017")), method="lm", lty=2) +
geom_point() +
labs(x=NULL, y="Annual precipitation (mm)") +
scale_color_discrete(NULL) +
lims(y=c(0, NA))
## `geom_smooth()` using formula = 'y ~ x'
Precipitation by month pre-2017 is explored:
yMax <- dfDailyATL %>%
group_by(year=year(date), month=month(date)) %>%
summarize(precip=sum(precipitation_sum), .groups="drop") %>%
mutate(precip=ceiling(precip/50)*50) %>%
pull() %>%
max()
yMax
## [1] 350
p1 <- dfDailyATL %>%
select(date, precipitation_sum) %>%
mutate(year=year(date),
decade=round(year(date)-4.5, -1),
month=factor(month.abb[month(date)], levels=month.abb),
tp=ifelse(year<2017, "pre", year)
) %>%
group_by(tp, year, month) %>%
summarize(precip=sum(precipitation_sum), .groups="drop") %>%
group_by(tp, month) %>%
summarize(mu=mean(precip), sd=sd(precip), .groups="drop") %>%
ggplot(aes(x=month)) +
geom_col(data=~filter(., tp=="pre"), aes(y=mu), fill="lightblue") +
geom_errorbar(data=~filter(., tp=="pre"), aes(ymin=mu-sd, ymax=mu+sd), width=0.2) +
geom_hline(data=~filter(., tp=="pre"), color="red", lty=2, aes(yintercept=mean(mu))) +
labs(x=NULL, y="Monthly precipitation (mm)", title="Monthly precipitation +/- 1 SD\n(pre-2017)") +
lims(y=c(0, yMax))
p1
Precipitation by month post-2017 is explored:
p2 <- dfDailyATL %>%
select(date, precipitation_sum) %>%
mutate(year=year(date),
decade=round(year(date)-4.5, -1),
month=factor(month.abb[month(date)], levels=month.abb),
tp=ifelse(year<2017, "pre", year)
) %>%
group_by(tp, year, month) %>%
summarize(precip=sum(precipitation_sum), .groups="drop") %>%
filter(tp!="pre") %>%
group_by(month) %>%
summarize(mu=mean(precip), mx=max(precip), mn=min(precip), .groups="drop") %>%
ggplot(aes(x=month)) +
geom_col(aes(y=mu), fill="lightblue") +
geom_errorbar(aes(ymin=mn, ymax=mx), width=0.2) +
geom_hline(color="red", lty=2, aes(yintercept=mean(mu))) +
labs(x=NULL, y="Monthly precipitation (mm)", title="Monthly precipitation\nmax-mean-min (post-2017)") +
lims(y=c(0, yMax))
p2
The data are plotted on a single plot:
p3 <- dfDailyATL %>%
select(date, precipitation_sum) %>%
mutate(year=year(date),
decade=round(year(date)-4.5, -1),
month=factor(month.abb[month(date)], levels=month.abb),
tp=ifelse(year<2017, "pre", "post")
) %>%
group_by(tp, year, month) %>%
summarize(precip=sum(precipitation_sum), .groups="drop") %>%
group_by(tp, month) %>%
summarize(mu=mean(precip), sd=sd(precip), mx=max(precip), mn=min(precip), .groups="drop") %>%
ggplot(aes(x=month)) +
geom_tile(data=~filter(., tp=="pre"), aes(y=mu, height=2*sd), width=0.75, fill="lightblue") +
geom_hline(data=~filter(., tp=="pre"), color="red", lty=2, aes(yintercept=mean(mu))) +
geom_errorbar(data=~filter(., tp=="post"), aes(ymin=mn, ymax=mx), width=0) +
geom_point(data=~filter(., tp=="post"), aes(y=mu)) +
annotate("text", x=1, y=25, label="bars are pre-2017\nmean +/- 1 SD", size=2.5, hjust=0) +
annotate("text", x=1, y=300, label="lines/points are post-2017\nmax-mean-min", size=2.5, hjust=0) +
labs(x=NULL, y="Monthly precipitation (mm)", title="Monthly precipitation\n(pre/post-2017)") +
lims(y=c(0, yMax))
p3
The average number of days with precipitation by month is explored:
yMax <- dfDailyATL %>%
group_by(year=year(date), month=month(date)) %>%
summarize(precip=sum(precipitation_sum>0), .groups="drop") %>%
mutate(precip=ceiling(precip/5)*5) %>%
pull() %>%
max()
yMax
## [1] 35
p4 <- dfDailyATL %>%
select(date, precipitation_sum) %>%
mutate(year=year(date),
decade=round(year(date)-4.5, -1),
month=factor(month.abb[month(date)], levels=month.abb),
tp=ifelse(year<2017, "pre", "post")
) %>%
group_by(tp, year, month) %>%
summarize(precip=sum(precipitation_sum>0), .groups="drop") %>%
group_by(tp, month) %>%
summarize(mu=mean(precip), sd=sd(precip), mx=max(precip), mn=min(precip), .groups="drop") %>%
ggplot(aes(x=month)) +
geom_tile(data=~filter(., tp=="pre"), aes(y=mu, height=2*sd), width=0.75, fill="lightblue") +
geom_hline(data=~filter(., tp=="pre"), color="red", lty=2, aes(yintercept=mean(mu))) +
geom_errorbar(data=~filter(., tp=="post"), aes(ymin=mn, ymax=mx), width=0) +
geom_point(data=~filter(., tp=="post"), aes(y=mu)) +
annotate("text", x=1, y=5, label="bars are pre-2017\nmean +/- 1 SD", size=2.5, hjust=0) +
annotate("text", x=1, y=30, label="lines/points are post-2017\nmax-mean-min", size=2.5, hjust=0) +
labs(x=NULL,
y="Monthly precipitation (days greater than zero)",
title="Monthly days with precipitation\n(pre/post-2017)") +
lims(y=c(0, yMax))
p4
Median precipitation on days with precipitation > 0, by month, is explored:
p5 <- dfDailyATL %>%
select(date, precipitation_sum) %>%
mutate(year=year(date),
decade=round(year(date)-4.5, -1),
month=factor(month.abb[month(date)], levels=month.abb),
tp=ifelse(year<2017, "pre", "post")
) %>%
group_by(tp, month) %>%
summarize(nprecip=sum(precipitation_sum>0),
tprecip=sum(precipitation_sum),
mdnprecip=median(ifelse(precipitation_sum>0, precipitation_sum, NA), na.rm=TRUE),
muprecip=mean(ifelse(precipitation_sum>0, precipitation_sum, NA), na.rm=TRUE),
.groups="drop"
) %>%
select(tp, month, mdnprecip, muprecip) %>%
pivot_longer(-c(tp, month)) %>%
ggplot(aes(x=month)) +
geom_line(aes(y=value, group=tp, color=c("pre"="pre-2017", "post"="post-2017")[tp])) +
facet_wrap(~c("mdnprecip"="Median (mm) on days with precipitation",
"muprecip"="Mean (mm) on days with precipitation"
)[name]
) +
labs(x=NULL,
title="Monthly precipitation on days with precipitation greater than zero\n(pre/post-2017)",
y="Precipitation (mm)"
) +
ylim(c(0, NA)) +
scale_color_discrete(NULL)
p5
The heaviest 3-day precipitation by year is explored:
p6 <- dfDailyATL %>%
select(date, precipitation_sum) %>%
mutate(year=year(date),
decade=round(year(date)-4.5, -1),
month=factor(month.abb[month(date)], levels=month.abb),
tp=ifelse(year<2017, "pre", "post")
) %>%
group_by(tp, year) %>%
helperRollingAgg(origVar="precipitation_sum", newName="r3precip", func=zoo::rollsum, k=3) %>%
summarize(r3max=max(r3precip, na.rm=TRUE), .groups="drop") %>%
ggplot(aes(x=year)) +
geom_line(aes(y=r3max, group=tp, color=c("pre"="pre-2017", "post"="post-2017")[tp])) +
labs(x=NULL,
title="Maximum 3-day precipitation by year",
y="rolling 3-day sum of precipitation (mm)"
) +
geom_smooth(method="lm",
se=TRUE,
aes(y=r3max, group=tp, color=c("pre"="pre-2017", "post"="post-2017")[tp])
) +
ylim(c(0, NA)) +
scale_color_discrete(NULL)
p6
## `geom_smooth()` using formula = 'y ~ x'
Long term daily data is downloaded for Detroit:
# Daily data download for Detroit, MI
testURLDaily <- helperOpenMeteoURL(cityName="Detroit MI",
dailyIndices=1:nrow(tblMetricsDaily),
startDate="1960-01-01",
endDate="2023-12-31",
tz="US/Eastern"
)
##
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
testURLDaily
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=42.38&longitude=-83.1&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FEastern"
# Download file
if(!file.exists("testOM_daily_dtw.json")) {
fileDownload(fileName="testOM_daily_dtw.json", url=testURLDaily)
} else {
cat("\nFile testOM_daily_dtw.json already exists, skipping download\n")
}
##
## File testOM_daily_dtw.json already exists, skipping download
The daily dataset is loaded:
# Read daily JSON file
dtwOMDaily <- formatOpenMeteoJSON("testOM_daily_dtw.json")
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## $tblDaily
## # A tibble: 23,376 × 18
## date time weathercode temperature_2m_max temperature_2m_min
## <date> <chr> <int> <dbl> <dbl>
## 1 1960-01-01 1960-01-01 3 0.4 -4.4
## 2 1960-01-02 1960-01-02 55 3.2 -2.5
## 3 1960-01-03 1960-01-03 51 2.7 -2.9
## 4 1960-01-04 1960-01-04 3 -3.3 -8.4
## 5 1960-01-05 1960-01-05 3 -6 -9.7
## 6 1960-01-06 1960-01-06 3 -0.8 -10.7
## 7 1960-01-07 1960-01-07 3 4.1 -4.9
## 8 1960-01-08 1960-01-08 51 2.5 -6.7
## 9 1960-01-09 1960-01-09 51 1.8 -7.3
## 10 1960-01-10 1960-01-10 51 4.7 -2
## # ℹ 23,366 more rows
## # ℹ 13 more variables: apparent_temperature_max <dbl>,
## # apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## # snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <chr>, sunset <chr>,
## # windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## # winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## # et0_fao_evapotranspiration <dbl>
##
## $tblHourly
## NULL
##
## $tblUnits
## # A tibble: 17 × 4
## metricType name value description
## <chr> <chr> <chr> <chr>
## 1 daily_units time "iso8601" <NA>
## 2 daily_units weathercode "wmo code" The most severe weather co…
## 3 daily_units temperature_2m_max "deg C" Maximum and minimum daily …
## 4 daily_units temperature_2m_min "deg C" Maximum and minimum daily …
## 5 daily_units apparent_temperature_max "deg C" Maximum and minimum daily …
## 6 daily_units apparent_temperature_min "deg C" Maximum and minimum daily …
## 7 daily_units precipitation_sum "mm" Sum of daily precipitation…
## 8 daily_units rain_sum "mm" Sum of daily rain
## 9 daily_units snowfall_sum "cm" Sum of daily snowfall
## 10 daily_units precipitation_hours "h" The number of hours with r…
## 11 daily_units sunrise "iso8601" Sun rise and set times
## 12 daily_units sunset "iso8601" Sun rise and set times
## 13 daily_units windspeed_10m_max "km/h" Maximum wind speed and gus…
## 14 daily_units windgusts_10m_max "km/h" Maximum wind speed and gus…
## 15 daily_units winddirection_10m_dominant "deg " Dominant wind direction
## 16 daily_units shortwave_radiation_sum "MJ/m²" The sum of solar radiaion …
## 17 daily_units et0_fao_evapotranspiration "mm" Daily sum of ET0 Reference…
##
## $tblDescription
## # A tibble: 1 × 7
## latitude longitude generationtime_ms utc_offset_seconds timezone
## <dbl> <dbl> <dbl> <int> <chr>
## 1 42.4 -83.1 372. -18000 US/Eastern
## # ℹ 2 more variables: timezone_abbreviation <chr>, elevation <dbl>
##
##
## latitude: 42.35501
## longitude: -83.13785
## generationtime_ms: 371.5783
## utc_offset_seconds: -18000
## timezone: US/Eastern
## timezone_abbreviation: GMT-5
## elevation: 199
# Sample records of tibble
dtwOMDaily$tblDaily %>% glimpse()
## Rows: 23,376
## Columns: 18
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <int> 3, 55, 51, 3, 3, 3, 3, 51, 51, 51, 3, 63, 6…
## $ temperature_2m_max <dbl> 0.4, 3.2, 2.7, -3.3, -6.0, -0.8, 4.1, 2.5, …
## $ temperature_2m_min <dbl> -4.4, -2.5, -2.9, -8.4, -9.7, -10.7, -4.9, …
## $ apparent_temperature_max <dbl> -4.4, -1.1, -1.8, -9.2, -13.1, -7.0, -3.1, …
## $ apparent_temperature_min <dbl> -8.1, -7.4, -8.9, -14.9, -15.2, -18.0, -9.6…
## $ precipitation_sum <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ rain_sum <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ snowfall_sum <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 0, 17, 4, 0, 0, 0, 0, 4, 3, 1, 0, 17, 6, 4,…
## $ sunrise <chr> "1960-01-01T08:01", "1960-01-02T08:01", "19…
## $ sunset <chr> "1960-01-01T17:10", "1960-01-02T17:10", "19…
## $ windspeed_10m_max <dbl> 15.0, 21.5, 25.6, 23.0, 28.7, 30.3, 30.9, 3…
## $ windgusts_10m_max <dbl> 30.2, 42.5, 48.2, 45.0, 54.4, 54.4, 57.6, 6…
## $ winddirection_10m_dominant <int> 109, 196, 261, 247, 257, 239, 223, 272, 161…
## $ shortwave_radiation_sum <dbl> 6.09, 0.93, 4.91, 4.91, 8.07, 6.47, 4.41, 6…
## $ et0_fao_evapotranspiration <dbl> 0.55, 0.33, 0.81, 0.96, 1.04, 1.07, 1.01, 1…
Variables are converted to proper data type:
dfDailyDTW <- dtwOMDaily$tblDaily %>%
mutate(weathercode=factor(weathercode),
sunrise_chr=sunrise,
sunset_chr=sunset,
sunrise=lubridate::ymd_hm(sunrise),
sunset=lubridate::ymd_hm(sunset),
fct_winddir=factor(winddirection_10m_dominant)
)
glimpse(dfDailyDTW)
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 3, 55, 51, 3, 3, 3, 3, 51, 51, 51, 3, 63, 6…
## $ temperature_2m_max <dbl> 0.4, 3.2, 2.7, -3.3, -6.0, -0.8, 4.1, 2.5, …
## $ temperature_2m_min <dbl> -4.4, -2.5, -2.9, -8.4, -9.7, -10.7, -4.9, …
## $ apparent_temperature_max <dbl> -4.4, -1.1, -1.8, -9.2, -13.1, -7.0, -3.1, …
## $ apparent_temperature_min <dbl> -8.1, -7.4, -8.9, -14.9, -15.2, -18.0, -9.6…
## $ precipitation_sum <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ rain_sum <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ snowfall_sum <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 0, 17, 4, 0, 0, 0, 0, 4, 3, 1, 0, 17, 6, 4,…
## $ sunrise <dttm> 1960-01-01 08:01:00, 1960-01-02 08:01:00, …
## $ sunset <dttm> 1960-01-01 17:10:00, 1960-01-02 17:10:00, …
## $ windspeed_10m_max <dbl> 15.0, 21.5, 25.6, 23.0, 28.7, 30.3, 30.9, 3…
## $ windgusts_10m_max <dbl> 30.2, 42.5, 48.2, 45.0, 54.4, 54.4, 57.6, 6…
## $ winddirection_10m_dominant <int> 109, 196, 261, 247, 257, 239, 223, 272, 161…
## $ shortwave_radiation_sum <dbl> 6.09, 0.93, 4.91, 4.91, 8.07, 6.47, 4.41, 6…
## $ et0_fao_evapotranspiration <dbl> 0.55, 0.33, 0.81, 0.96, 1.04, 1.07, 1.01, 1…
## $ sunrise_chr <chr> "1960-01-01T08:01", "1960-01-02T08:01", "19…
## $ sunset_chr <chr> "1960-01-01T17:10", "1960-01-02T17:10", "19…
## $ fct_winddir <fct> 109, 196, 261, 247, 257, 239, 223, 272, 161…
Averages by month for select continuous variables are plotted:
tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)",
"windspeed_10m_max"="4. Windspeed (kph)",
"temperature_2m_max"="1. High Temperature (C)",
"temperature_2m_min"="2. Low Temperature (C)"
)
dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(year, month, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
group_by(year, month) %>%
summarize(across(-c(precipitation_sum), .fns=mean),
across(c(precipitation_sum), .fns=sum),
.groups="drop"
) %>%
group_by(month) %>%
summarize(across(-c(year), .fns=mean)) %>%
pivot_longer(-c(month)) %>%
ggplot(aes(x=month, y=value)) +
geom_line(aes(group=1)) +
facet_wrap(~tmpMapNames[name], scales="free_y") +
labs(x=NULL, y=NULL, title="Monthly averages for key metrics (Detroit 1960-2023)")
Averages by month for select categorical variables are plotted:
tmpMapNames <- c("wc"="1. Weather Type",
"winddir"="2. Predominant Wind Direction"
)
tmpDFPlot <- dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=315~"1. N",
winddirection_10m_dominant>=225~"2. W",
winddirection_10m_dominant>=135~"3. S",
winddirection_10m_dominant>=45~"4. E",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
),
wc=case_when(weathercode==0~"1. Clear",
weathercode %in% c(1, 2, 3)~"2. Dry",
weathercode %in% c(51, 53, 55)~"3. Drizzle",
weathercode %in% c(61, 63, 65)~"4. Rain",
weathercode %in% c(71, 73, 75)~"5. Snow",
TRUE~"Error"
)
) %>%
select(month, wc, winddir) %>%
pivot_longer(-c(month)) %>%
count(month, name, value)
tmpPlotFN <- function(x) {
p1 <- tmpDFPlot %>%
filter(name==x) %>%
ggplot(aes(x=month, y=n)) +
geom_line(aes(group=value, color=value), lwd=2) +
labs(x=NULL,
y=NULL,
title=paste0(tmpMapNames[x], " (Detroit 1960-2023)")
) +
scale_color_discrete(NULL) +
theme(legend.position = "bottom")
return(p1)
}
gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)
Averages by year for select continuous variables are plotted:
tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)",
"windspeed_10m_max"="4. Windspeed (kph)",
"temperature_2m_max"="1. High Temperature (C)",
"temperature_2m_min"="2. Low Temperature (C)"
)
dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(year, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
group_by(year) %>%
summarize(across(-c(precipitation_sum), .fns=mean),
across(c(precipitation_sum), .fns=sum),
.groups="drop"
) %>%
pivot_longer(-c(year)) %>%
ggplot(aes(x=year, y=value)) +
geom_line(aes(group=1)) +
facet_wrap(~tmpMapNames[name], scales="free_y") +
labs(x=NULL, y=NULL, title="Annual averages for key metrics (Detroit 1960-2023)")
Averages by year for select categorical variables are plotted:
tmpMapNames <- c("wc"="1. Weather Type",
"winddir"="2. Predominant Wind Direction"
)
tmpDFPlot <- dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=315~"1. N",
winddirection_10m_dominant>=225~"2. W",
winddirection_10m_dominant>=135~"3. S",
winddirection_10m_dominant>=45~"4. E",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
),
wc=case_when(weathercode==0~"1. Clear",
weathercode %in% c(1, 2, 3)~"2. Dry",
weathercode %in% c(51, 53, 55)~"3. Drizzle",
weathercode %in% c(61, 63, 65)~"4. Rain",
weathercode %in% c(71, 73, 75)~"5. Snow",
TRUE~"Error"
)
) %>%
select(year, wc, winddir) %>%
pivot_longer(-c(year)) %>%
count(year, name, value)
tmpPlotFN <- function(x) {
p1 <- tmpDFPlot %>%
filter(name==x) %>%
ggplot(aes(x=year, y=n)) +
geom_line(aes(group=value, color=value), lwd=1) +
labs(x=NULL,
y=NULL,
title=paste0(tmpMapNames[x], " (Detroit 1960-2023)")
) +
scale_color_discrete(NULL) +
theme(legend.position = "bottom")
return(p1)
}
gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)
The apparent decrease in days with northerly winds is explored further:
dfDailyWind <- dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=300~"1. N",
winddirection_10m_dominant>240~"2. E/W",
winddirection_10m_dominant>=120~"3. S",
winddirection_10m_dominant>60~"2. E/W",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
)
) %>%
select(year, month, date, winddir, winddirection_10m_dominant, windspeed_10m_max)
dfDailyWind
## # A tibble: 23,376 × 6
## year month date winddir winddirection_10m_dominant windspeed_10m_max
## <dbl> <fct> <date> <chr> <int> <dbl>
## 1 1960 Jan 1960-01-01 2. E/W 109 15
## 2 1960 Jan 1960-01-02 3. S 196 21.5
## 3 1960 Jan 1960-01-03 2. E/W 261 25.6
## 4 1960 Jan 1960-01-04 2. E/W 247 23
## 5 1960 Jan 1960-01-05 2. E/W 257 28.7
## 6 1960 Jan 1960-01-06 3. S 239 30.3
## 7 1960 Jan 1960-01-07 3. S 223 30.9
## 8 1960 Jan 1960-01-08 2. E/W 272 32.8
## 9 1960 Jan 1960-01-09 3. S 161 18.4
## 10 1960 Jan 1960-01-10 2. E/W 284 10.2
## # ℹ 23,366 more rows
dfDailyWind %>%
count(year, winddir) %>%
group_by(year) %>%
mutate(pct=n/sum(n)) %>%
ungroup() %>%
ggplot(aes(x=year, y=pct)) +
geom_line(aes(group=winddir, color=winddir), lwd=1) +
labs(x=NULL,
y="Percent of Days",
title=paste0("Predominant Wind Direction", " (Detroit 1960-2023)"),
caption="N (300-060)\nE/W (061-119, 241-299)\nS (120-240)"
) +
scale_color_discrete(NULL) +
geom_hline(data=~summarize(group_by(., winddir), pct=mean(pct)),
aes(color=winddir, group=winddir, yintercept=pct),
lty=2
) +
lims(y=c(0, NA)) +
theme(legend.position = "bottom")
Boxplots for maximum windspeed are created:
dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, windspeed_10m_max) %>%
ggplot(aes(x=month, y=windspeed_10m_max)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Maximum daily wind speed (kph)",
title="Maximum daily windspeed by month (Detroit 1960-2023)"
)
Boxplots for precipitation are created:
dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, precipitation_hours) %>%
ggplot(aes(x=month, y=precipitation_hours)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily precipitation hours",
title="Daily precipitation hours by month (Detroit 1960-2023)"
)
dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, precipitation_sum) %>%
ggplot(aes(x=month, y=precipitation_sum)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily precipitation (mm)",
title="Daily precipitation by month (Detroit 1960-2023)"
)
Boxplots for temperature are created:
dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, temperature_2m_max) %>%
ggplot(aes(x=month, y=temperature_2m_max)) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y="Daily high temperature (C)",
title="Daily high temperature by month (Detroit 1960-2023)"
)
dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, temperature_2m_min) %>%
ggplot(aes(x=month, y=temperature_2m_min)) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y="Daily low temperature (C)",
title="Daily low temperature by month (Detroit 1960-2023)"
)
ACF and PACF are explored for maximum temperature:
acfTemp <- acf(dfDailyDTW$temperature_2m_max, lag.max=1000)
as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which()
## [1] 367 728
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 184 551 915
pacfTemp <- pacf(dfDailyDTW$temperature_2m_max, lag.max=1000)
pacf(dfDailyDTW$temperature_2m_max, lag.max=50)
As expected, ACF has a sustained seasonal pattern, with peaks (at roughly intervals of 365) and valleys (at roughly intervals of 365 offset by roughly 185) corresponding to the 365-day year
ACF and PACF are explored for precipitation:
acfPrecip <- acf(dfDailyDTW$precipitation_sum, lag.max=1000)
acf(dfDailyDTW$precipitation_sum, lag.max=50)
as.vector(acfPrecip$acf) %>% findPeaks(width=21) %>% which()
## [1] 30 56 77 101 118 147 179 197 213 250 266 301 327 358 371 400 428 442 454
## [20] 469 489 508 520 541 553 573 588 605 618 631 644 657 678 690 727 746 769 790
## [39] 820 847 859 879 898 918 929 943 958 974
as.vector(acfPrecip$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 18 32 60 86 114 137 162 189 209 247 261 280 296 325 340 354 386 410 426
## [20] 451 462 494 515 535 550 568 583 603 615 652 680 708 725 742 771 806 831 852
## [39] 875 907 922 934 956 977
pacfPrecip <- pacf(dfDailyDTW$precipitation_sum, lag.max=1000)
pacf(dfDailyDTW$precipitation_sum, lag.max=50)
Precipitation by contrast has little seasonality and mostly just a correlation to the previous day’s value
ACF and PACF are explored for wind speed:
acfWind <- acf(dfDailyDTW$windspeed_10m_max, lag.max=1000)
acf(dfDailyDTW$windspeed_10m_max, lag.max=50)
as.vector(acfWind$acf) %>% findPeaks(width=21) %>% which()
## [1] 28 41 73 142 167 182 202 228 239 281 316 330 354 374 408 476 490 523 537
## [20] 564 620 695 717 737 834 852 873 900 952 977
as.vector(acfWind$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 31 87 136 154 169 191 210 253 320 356 376 404 451 483 502 518 556 575 588
## [20] 600 753 839 869 916 935 948
pacfWind <- pacf(dfDailyDTW$windspeed_10m_max, lag.max=1000)
pacf(dfDailyDTW$windspeed_10m_max, lag.max=50)
Similar to temperature, wind speed appears to have a sustained seasonal component, though peak correlations are much lower in magnitude (~0.1 for wind speed vs. ~0.8 for temperature)
A boxplot for delta daily temperature is created:
dfDailyDTW %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=temperature_2m_max-lag(temperature_2m_max)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in maximum temperature")
Daily temperature changes are generally larger in magnitude during the colder season
A boxplot for delta daily wind speed is created:
dfDailyDTW %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=windspeed_10m_max-lag(windspeed_10m_max)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in maximum wind speed")
Daily wind speed changes are generally larger in magnitude during the colder season
A boxplot for delta daily precipitation is created:
dfDailyDTW %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=precipitation_sum-lag(precipitation_sum)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in precipitation")
Most days have no precipitation. The corresponding boxplot for precipitation change has small boxes and whiskers with many outliers
Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21 <- dfDailyDTW %>%
mutate(doy=pmin(yday(date), 365)) %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(date, doy, contains("_r21"))
df_r21 %>%
na.omit() %>%
group_by(doy) %>%
summarize(across(where(is.numeric), .fns=mean), n=n()) %>%
pivot_longer(cols=-c(doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=name, color=name)) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean") +
theme(legend.position="none")
Outliers are likely much more important in driving average precipitation than in driving average temperature
Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21_sd <- dfDailyDTW %>%
mutate(doy=pmin(yday(date), 365)) %>%
group_by(doy) %>%
mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
ungroup() %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(date, doy, contains("_r21"))
df_r21_sd %>%
na.omit() %>%
group_by(doy) %>%
summarize(across(where(is.numeric), .fns=mean)) %>%
pivot_longer(cols=-c(doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=name, color=name)) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") +
theme(legend.position="none")
Means and standard deviations are plotted together:
df_r21 %>%
bind_rows(df_r21_sd, .id="src") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
group_by(doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(doy, musig)) %>%
pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=name, color=name), lwd=2) +
geom_ribbon(aes(ymin=Mean-SD, ymax=Mean+SD, fill=name), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y="Rolling-21 mean +/- daily standard deviation",
title="Rolling 21-day mean +/- one rolling 21-day sd"
) +
theme(legend.position="none")
Means and approximate SEM are plotted together:
nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
bind_rows(df_r21_sd, .id="src") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
group_by(doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(doy, musig)) %>%
pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=name, color=name), lwd=2) +
geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=name), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y="Rolling-21 mean +/- 1 SEM (approx)",
title="Rolling 21-day mean +/- 1 SEM (approx)"
) +
theme(legend.position="none")
Consistent with previous analyses, temperature has a strong seasonal pattern with differences in mean that vastly exceed the standard error of the mean. By contrast, while there is apparent seasonal spikiness to precipitation, rolling 21-day means are usually within one standard error of the mean of each other. Wind is more similar to temperature in having seasonal variations that meaningfully exceed SEM in magnitude.
Percentage of cumulative precipitation by volume of daily precipitation is explored:
tmpPlotData <- dfDailyDTW %>%
select(date, precipitation_sum) %>%
group_by(precipitation_sum) %>%
summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
mutate(csp=cumsum(precip), cpp=csp/sum(precip), csn=cumsum(n), cpn=csn/sum(n))
tmpPlotData %>%
pivot_longer(cols=-c(precipitation_sum)) %>%
ggplot(aes(x=precipitation_sum, y=value)) +
geom_line(data=~filter(., name %in% c("cpp", "cpn")),
aes(group=name, color=c("cpn"="# events", "cpp"="precip")[name])
) +
labs(x="Daily precipitation (mm)", y="% total (cumul)") +
scale_color_discrete("Metric")
Around half of days have no precipitation. Around 25% of total precipitation comes from the rare day with 15+ mm (above about half an inch) of precipitation
Total precipitation by decade is also explored, sorted by daily precipitation:
dfDailyDTW %>%
select(date, precipitation_sum) %>%
mutate(year=year(date), decade=round(year(date)-4.5, -1)) %>%
group_by(decade, precipitation_sum) %>%
summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
group_by(decade) %>%
mutate(csp=cumsum(precip), ntot=sum(n), meancsp=365*csp/ntot) %>%
ungroup() %>%
ggplot(aes(x=precipitation_sum, y=meancsp)) +
geom_line(aes(group=factor(decade), color=factor(decade)), lwd=1) +
labs(x="Daily precipitation (mm)",
y="Cumulative annual precipitation (mm)",
title="Average annual precipitation by decade",
subtitle="Cumulative, for daily precipitation amounts LTE x-axis"
) +
scale_color_discrete("Decade")
Trends look similar by decade for precipitation amounts under 0.5 inch (12.5 mm). Heavier precipitation amounts appear to be more prevalent in the 2020s and lightest in the 1960s, driving differences in annual precipitation
Long term daily data is downloaded for Chicago:
# Daily data download for Chicago, IL
testURLDaily <- helperOpenMeteoURL(cityName="Chicago IL",
dailyIndices=1:nrow(tblMetricsDaily),
startDate="1960-01-01",
endDate="2023-12-31",
tz="US/Central"
)
##
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
testURLDaily
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=41.84&longitude=-87.68&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FCentral"
# Download file
if(!file.exists("testOM_daily_ord.json")) {
fileDownload(fileName="testOM_daily_ord.json", url=testURLDaily)
} else {
cat("\nFile testOM_daily_ord.json already exists, skipping download\n")
}
##
## File testOM_daily_ord.json already exists, skipping download
The daily dataset is loaded:
# Read daily JSON file
ordOMDaily <- formatOpenMeteoJSON("testOM_daily_ord.json")
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## $tblDaily
## # A tibble: 23,376 × 18
## date time weathercode temperature_2m_max temperature_2m_min
## <date> <chr> <int> <dbl> <dbl>
## 1 1960-01-01 1960-01-01 3 1.6 -3.5
## 2 1960-01-02 1960-01-02 51 4.9 -0.9
## 3 1960-01-03 1960-01-03 3 -0.6 -7.5
## 4 1960-01-04 1960-01-04 2 -5.9 -11.8
## 5 1960-01-05 1960-01-05 3 -6.1 -11.2
## 6 1960-01-06 1960-01-06 1 1 -10.3
## 7 1960-01-07 1960-01-07 3 4.9 -3.2
## 8 1960-01-08 1960-01-08 3 1.6 -2.8
## 9 1960-01-09 1960-01-09 51 7.1 -3.4
## 10 1960-01-10 1960-01-10 51 4.2 -1
## # ℹ 23,366 more rows
## # ℹ 13 more variables: apparent_temperature_max <dbl>,
## # apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## # snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <chr>, sunset <chr>,
## # windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## # winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## # et0_fao_evapotranspiration <dbl>
##
## $tblHourly
## NULL
##
## $tblUnits
## # A tibble: 17 × 4
## metricType name value description
## <chr> <chr> <chr> <chr>
## 1 daily_units time "iso8601" <NA>
## 2 daily_units weathercode "wmo code" The most severe weather co…
## 3 daily_units temperature_2m_max "deg C" Maximum and minimum daily …
## 4 daily_units temperature_2m_min "deg C" Maximum and minimum daily …
## 5 daily_units apparent_temperature_max "deg C" Maximum and minimum daily …
## 6 daily_units apparent_temperature_min "deg C" Maximum and minimum daily …
## 7 daily_units precipitation_sum "mm" Sum of daily precipitation…
## 8 daily_units rain_sum "mm" Sum of daily rain
## 9 daily_units snowfall_sum "cm" Sum of daily snowfall
## 10 daily_units precipitation_hours "h" The number of hours with r…
## 11 daily_units sunrise "iso8601" Sun rise and set times
## 12 daily_units sunset "iso8601" Sun rise and set times
## 13 daily_units windspeed_10m_max "km/h" Maximum wind speed and gus…
## 14 daily_units windgusts_10m_max "km/h" Maximum wind speed and gus…
## 15 daily_units winddirection_10m_dominant "deg " Dominant wind direction
## 16 daily_units shortwave_radiation_sum "MJ/m²" The sum of solar radiaion …
## 17 daily_units et0_fao_evapotranspiration "mm" Daily sum of ET0 Reference…
##
## $tblDescription
## # A tibble: 1 × 7
## latitude longitude generationtime_ms utc_offset_seconds timezone
## <dbl> <dbl> <dbl> <int> <chr>
## 1 41.9 -87.6 369. -21600 US/Central
## # ℹ 2 more variables: timezone_abbreviation <chr>, elevation <dbl>
##
##
## latitude: 41.86292
## longitude: -87.64877
## generationtime_ms: 368.9336
## utc_offset_seconds: -21600
## timezone: US/Central
## timezone_abbreviation: GMT-6
## elevation: 180
# Sample records of tibble
ordOMDaily$tblDaily %>% glimpse()
## Rows: 23,376
## Columns: 18
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <int> 3, 51, 3, 2, 3, 1, 3, 3, 51, 51, 61, 63, 53…
## $ temperature_2m_max <dbl> 1.6, 4.9, -0.6, -5.9, -6.1, 1.0, 4.9, 1.6, …
## $ temperature_2m_min <dbl> -3.5, -0.9, -7.5, -11.8, -11.2, -10.3, -3.2…
## $ apparent_temperature_max <dbl> -4.3, -0.6, -6.9, -13.1, -13.6, -4.4, -1.3,…
## $ apparent_temperature_min <dbl> -7.9, -7.0, -14.4, -18.4, -17.7, -17.9, -8.…
## $ precipitation_sum <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ rain_sum <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ snowfall_sum <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 0, 11, 0, 0, 0, 0, 0, 0, 6, 1, 5, 24, 5, 10…
## $ sunrise <chr> "1960-01-01T07:18", "1960-01-02T07:18", "19…
## $ sunset <chr> "1960-01-01T16:29", "1960-01-02T16:30", "19…
## $ windspeed_10m_max <dbl> 22.7, 27.6, 27.0, 29.1, 29.6, 32.1, 32.7, 3…
## $ windgusts_10m_max <dbl> 40.0, 52.6, 44.6, 50.8, 48.2, 52.6, 56.5, 5…
## $ winddirection_10m_dominant <int> 142, 214, 268, 247, 261, 232, 234, 275, 185…
## $ shortwave_radiation_sum <dbl> 7.45, 2.25, 4.58, 8.66, 9.09, 8.79, 5.86, 8…
## $ et0_fao_evapotranspiration <dbl> 0.95, 0.66, 1.06, 1.06, 1.04, 1.33, 1.23, 1…
Variables are converted to proper data type:
dfDailyORD <- ordOMDaily$tblDaily %>%
mutate(weathercode=factor(weathercode),
sunrise_chr=sunrise,
sunset_chr=sunset,
sunrise=lubridate::ymd_hm(sunrise),
sunset=lubridate::ymd_hm(sunset),
fct_winddir=factor(winddirection_10m_dominant)
)
glimpse(dfDailyORD)
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 3, 51, 3, 2, 3, 1, 3, 3, 51, 51, 61, 63, 53…
## $ temperature_2m_max <dbl> 1.6, 4.9, -0.6, -5.9, -6.1, 1.0, 4.9, 1.6, …
## $ temperature_2m_min <dbl> -3.5, -0.9, -7.5, -11.8, -11.2, -10.3, -3.2…
## $ apparent_temperature_max <dbl> -4.3, -0.6, -6.9, -13.1, -13.6, -4.4, -1.3,…
## $ apparent_temperature_min <dbl> -7.9, -7.0, -14.4, -18.4, -17.7, -17.9, -8.…
## $ precipitation_sum <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ rain_sum <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ snowfall_sum <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 0, 11, 0, 0, 0, 0, 0, 0, 6, 1, 5, 24, 5, 10…
## $ sunrise <dttm> 1960-01-01 07:18:00, 1960-01-02 07:18:00, …
## $ sunset <dttm> 1960-01-01 16:29:00, 1960-01-02 16:30:00, …
## $ windspeed_10m_max <dbl> 22.7, 27.6, 27.0, 29.1, 29.6, 32.1, 32.7, 3…
## $ windgusts_10m_max <dbl> 40.0, 52.6, 44.6, 50.8, 48.2, 52.6, 56.5, 5…
## $ winddirection_10m_dominant <int> 142, 214, 268, 247, 261, 232, 234, 275, 185…
## $ shortwave_radiation_sum <dbl> 7.45, 2.25, 4.58, 8.66, 9.09, 8.79, 5.86, 8…
## $ et0_fao_evapotranspiration <dbl> 0.95, 0.66, 1.06, 1.06, 1.04, 1.33, 1.23, 1…
## $ sunrise_chr <chr> "1960-01-01T07:18", "1960-01-02T07:18", "19…
## $ sunset_chr <chr> "1960-01-01T16:29", "1960-01-02T16:30", "19…
## $ fct_winddir <fct> 142, 214, 268, 247, 261, 232, 234, 275, 185…
Averages by month for select continuous variables are plotted:
tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)",
"windspeed_10m_max"="4. Windspeed (kph)",
"temperature_2m_max"="1. High Temperature (C)",
"temperature_2m_min"="2. Low Temperature (C)"
)
dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(year, month, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
group_by(year, month) %>%
summarize(across(-c(precipitation_sum), .fns=mean),
across(c(precipitation_sum), .fns=sum),
.groups="drop"
) %>%
group_by(month) %>%
summarize(across(-c(year), .fns=mean)) %>%
pivot_longer(-c(month)) %>%
ggplot(aes(x=month, y=value)) +
geom_line(aes(group=1)) +
facet_wrap(~tmpMapNames[name], scales="free_y") +
labs(x=NULL, y=NULL, title="Monthly averages for key metrics (Chicago 1960-2023)")
Averages by month for select categorical variables are plotted:
tmpMapNames <- c("wc"="1. Weather Type",
"winddir"="2. Predominant Wind Direction"
)
tmpDFPlot <- dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=315~"1. N",
winddirection_10m_dominant>=225~"2. W",
winddirection_10m_dominant>=135~"3. S",
winddirection_10m_dominant>=45~"4. E",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
),
wc=case_when(weathercode==0~"1. Clear",
weathercode %in% c(1, 2, 3)~"2. Dry",
weathercode %in% c(51, 53, 55)~"3. Drizzle",
weathercode %in% c(61, 63, 65)~"4. Rain",
weathercode %in% c(71, 73, 75)~"5. Snow",
TRUE~"Error"
)
) %>%
select(month, wc, winddir) %>%
pivot_longer(-c(month)) %>%
count(month, name, value)
tmpPlotFN <- function(x) {
p1 <- tmpDFPlot %>%
filter(name==x) %>%
ggplot(aes(x=month, y=n)) +
geom_line(aes(group=value, color=value), lwd=2) +
labs(x=NULL,
y=NULL,
title=paste0(tmpMapNames[x], " (Chicago 1960-2023)")
) +
scale_color_discrete(NULL) +
theme(legend.position = "bottom")
return(p1)
}
gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)
Averages by year for select continuous variables are plotted:
tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)",
"windspeed_10m_max"="4. Windspeed (kph)",
"temperature_2m_max"="1. High Temperature (C)",
"temperature_2m_min"="2. Low Temperature (C)"
)
dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(year, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
group_by(year) %>%
summarize(across(-c(precipitation_sum), .fns=mean),
across(c(precipitation_sum), .fns=sum),
.groups="drop"
) %>%
pivot_longer(-c(year)) %>%
ggplot(aes(x=year, y=value)) +
geom_line(aes(group=1)) +
facet_wrap(~tmpMapNames[name], scales="free_y") +
labs(x=NULL, y=NULL, title="Annual averages for key metrics (Chicago 1960-2023)")
Averages by year for select categorical variables are plotted:
tmpMapNames <- c("wc"="1. Weather Type",
"winddir"="2. Predominant Wind Direction"
)
tmpDFPlot <- dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=315~"1. N",
winddirection_10m_dominant>=225~"2. W",
winddirection_10m_dominant>=135~"3. S",
winddirection_10m_dominant>=45~"4. E",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
),
wc=case_when(weathercode==0~"1. Clear",
weathercode %in% c(1, 2, 3)~"2. Dry",
weathercode %in% c(51, 53, 55)~"3. Drizzle",
weathercode %in% c(61, 63, 65)~"4. Rain",
weathercode %in% c(71, 73, 75)~"5. Snow",
TRUE~"Error"
)
) %>%
select(year, wc, winddir) %>%
pivot_longer(-c(year)) %>%
count(year, name, value)
tmpPlotFN <- function(x) {
p1 <- tmpDFPlot %>%
filter(name==x) %>%
ggplot(aes(x=year, y=n)) +
geom_line(aes(group=value, color=value), lwd=1) +
labs(x=NULL,
y=NULL,
title=paste0(tmpMapNames[x], " (Chicago 1960-2023)")
) +
scale_color_discrete(NULL) +
theme(legend.position = "bottom")
return(p1)
}
gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)
Boxplots for maximum windspeed are created:
dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, windspeed_10m_max) %>%
ggplot(aes(x=month, y=windspeed_10m_max)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Maximum daily wind speed (kph)",
title="Maximum daily windspeed by month (Chicago 1960-2023)"
)
Boxplots for precipitation are created:
dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, precipitation_hours) %>%
ggplot(aes(x=month, y=precipitation_hours)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily precipitation hours",
title="Daily precipitation hours by month (Chicago 1960-2023)"
)
dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, precipitation_sum) %>%
ggplot(aes(x=month, y=precipitation_sum)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily precipitation (mm)",
title="Daily precipitation by month (Chicago 1960-2023)"
)
Boxplots for snow are also created:
dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, snowfall_sum) %>%
ggplot(aes(x=month, y=snowfall_sum)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily snowfall (liquid equivalent mm)",
title="Daily snowfall by month (Chicago 1960-2023)"
)
Boxplots for temperature are created:
dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, temperature_2m_max) %>%
ggplot(aes(x=month, y=temperature_2m_max)) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y="Daily high temperature (C)",
title="Daily high temperature by month (Chicago 1960-2023)"
)
dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, temperature_2m_min) %>%
ggplot(aes(x=month, y=temperature_2m_min)) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y="Daily low temperature (C)",
title="Daily low temperature by month (Chicago 1960-2023)"
)
ACF and PACF are explored for maximum temperature:
acfTemp <- acf(dfDailyORD$temperature_2m_max, lag.max=1000)
as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which()
## [1] 366 728
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 180 551 915
pacfTemp <- pacf(dfDailyORD$temperature_2m_max, lag.max=1000)
pacf(dfDailyORD$temperature_2m_max, lag.max=50)
As expected, ACF has a sustained seasonal pattern, with peaks (at roughly intervals of 365) and valleys (at roughly intervals of 365 offset by roughly 185) corresponding to the 365-day year
ACF and PACF are explored for precipitation:
acfPrecip <- acf(dfDailyORD$precipitation_sum, lag.max=1000)
acf(dfDailyORD$precipitation_sum, lag.max=50)
as.vector(acfPrecip$acf) %>% findPeaks(width=21) %>% which()
## [1] 42 80 94 122 167 179 205 216 229 251 268 288 302 332 350 388 413 443 454
## [20] 469 492 528 570 599 612 633 654 690 730 758 770 789 808 820 845 865 881 915
## [39] 926 940 951 976
as.vector(acfPrecip$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 24 49 67 86 114 137 163 181 192 218 231 248 260 281 311 324 340 355 375
## [20] 400 430 451 462 482 521 535 549 560 583 597 616 631 662 687 713 736 757 776
## [39] 795 822 853 875 889 934 949 964 979
pacfPrecip <- pacf(dfDailyORD$precipitation_sum, lag.max=1000)
pacf(dfDailyORD$precipitation_sum, lag.max=50)
Precipitation by contrast has little seasonality and mostly just a slight correlation to the previous day’s value
ACF and PACF are explored for wind speed:
acfWind <- acf(dfDailyORD$windspeed_10m_max, lag.max=1000)
acf(dfDailyORD$windspeed_10m_max, lag.max=50)
as.vector(acfWind$acf) %>% findPeaks(width=21) %>% which()
## [1] 149 167 182 207 228 239 353 373 408 476 523 538 554 572 594 620 722 737 852
## [20] 883 901 926 953
as.vector(acfWind$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 112 126 137 154 169 190 217 253 278 376 469 483 506 518 556 589 633 725 739
## [20] 753 847 862 879 916 939 959 976
pacfWind <- pacf(dfDailyORD$windspeed_10m_max, lag.max=1000)
pacf(dfDailyORD$windspeed_10m_max, lag.max=50)
Similar to temperature, wind speed appears to have a sustained seasonal component, though peak correlations are much lower in magnitude (~0.1 for wind speed vs. ~0.8 for temperature)
A boxplot for delta daily temperature is created:
dfDailyORD %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=temperature_2m_max-lag(temperature_2m_max)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in maximum temperature")
Daily temperature changes are generally larger in magnitude during the colder season
A boxplot for delta daily wind speed is created:
dfDailyORD %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=windspeed_10m_max-lag(windspeed_10m_max)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in maximum wind speed")
Daily wind speed changes are generally similar in magnitude by season
A boxplot for delta daily precipitation is created:
dfDailyORD %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=precipitation_sum-lag(precipitation_sum)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in precipitation")
Most days have no precipitation. The corresponding boxplot for precipitation change has small boxes and whiskers with many outliers
Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21 <- dfDailyORD %>%
mutate(doy=pmin(yday(date), 365)) %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(date, doy, contains("_r21"))
df_r21 %>%
na.omit() %>%
group_by(doy) %>%
summarize(across(where(is.numeric), .fns=mean), n=n()) %>%
pivot_longer(cols=-c(doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=name, color=name)) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean") +
theme(legend.position="none")
Outliers are likely much more important in driving average precipitation than in driving average temperature
Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21_sd <- dfDailyORD %>%
mutate(doy=pmin(yday(date), 365)) %>%
group_by(doy) %>%
mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
ungroup() %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(date, doy, contains("_r21"))
df_r21_sd %>%
na.omit() %>%
group_by(doy) %>%
summarize(across(where(is.numeric), .fns=mean)) %>%
pivot_longer(cols=-c(doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=name, color=name)) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") +
theme(legend.position="none")
Means and standard deviations are plotted together:
df_r21 %>%
bind_rows(df_r21_sd, .id="src") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
group_by(doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(doy, musig)) %>%
pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=name, color=name), lwd=2) +
geom_ribbon(aes(ymin=Mean-SD, ymax=Mean+SD, fill=name), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y="Rolling-21 mean +/- daily standard deviation",
title="Rolling 21-day mean +/- one rolling 21-day sd"
) +
theme(legend.position="none")
Means and approximate SEM are plotted together:
nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
bind_rows(df_r21_sd, .id="src") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
group_by(doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(doy, musig)) %>%
pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=name, color=name), lwd=2) +
geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=name), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y="Rolling-21 mean +/- 1 SEM (approx)",
title="Rolling 21-day mean +/- 1 SEM (approx)"
) +
theme(legend.position="none")
Consistent with previous analyses, temperature has a strong seasonal pattern with differences in mean that vastly exceed the standard error of the mean. By contrast, while there is apparent seasonal spikiness to precipitation, rolling 21-day means are usually within one standard error of the mean of each other. Wind is more similar to temperature in having seasonal variations that meaningfully exceed SEM in magnitude.
Percentage of cumulative precipitation by volume of daily precipitation is explored:
tmpPlotData <- dfDailyORD %>%
select(date, precipitation_sum) %>%
group_by(precipitation_sum) %>%
summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
mutate(csp=cumsum(precip), cpp=csp/sum(precip), csn=cumsum(n), cpn=csn/sum(n))
tmpPlotData %>%
pivot_longer(cols=-c(precipitation_sum)) %>%
ggplot(aes(x=precipitation_sum, y=value)) +
geom_line(data=~filter(., name %in% c("cpp", "cpn")),
aes(group=name, color=c("cpn"="# events", "cpp"="precip")[name])
) +
labs(x="Daily precipitation (mm)", y="% total (cumul)") +
scale_color_discrete("Metric")
Around half of days have no precipitation. Around 25% of total precipitation comes from the rare day with 20+ mm (nearly an inch) of precipitation
Total precipitation by decade is also explored, sorted by daily precipitation:
dfDailyORD %>%
select(date, precipitation_sum) %>%
mutate(year=year(date), decade=round(year(date)-4.5, -1)) %>%
group_by(decade, precipitation_sum) %>%
summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
group_by(decade) %>%
mutate(csp=cumsum(precip), ntot=sum(n), meancsp=365*csp/ntot) %>%
ungroup() %>%
ggplot(aes(x=precipitation_sum, y=meancsp)) +
geom_line(aes(group=factor(decade), color=factor(decade)), lwd=1) +
labs(x="Daily precipitation (mm)",
y="Cumulative annual precipitation (mm)",
title="Average annual precipitation by decade",
subtitle="Cumulative, for daily precipitation amounts LTE x-axis"
) +
scale_color_discrete("Decade")
Trends look similar by decade for precipitation amounts under 0.5 inch (12.5 mm). Heavier precipitation amounts appear to be more prevalent in the 2010s/2020s and lightest in the 1960s, driving differences in annual precipitation
Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window, for Atlanta, Detroit, and Chicago:
df_r21 <- bind_rows(dfDailyATL, dfDailyORD, dfDailyDTW, .id="src") %>%
mutate(doy=pmin(yday(date), 365),
src=c("1"="Atlanta", "2"="Chicago", "3"="Detroit")[src]
) %>%
group_by(src) %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(src, date, doy, contains("_r21")) %>%
ungroup()
df_r21 %>%
na.omit() %>%
group_by(src, doy) %>%
summarize(across(where(is.numeric), .fns=mean), n=n(), .groups="drop") %>%
pivot_longer(cols=-c(src, doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=src, color=src), lwd=1) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean") +
theme(legend.position="bottom") +
scale_color_discrete(NULL)
Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21_sd <- bind_rows(dfDailyATL, dfDailyORD, dfDailyDTW, .id="src") %>%
mutate(doy=pmin(yday(date), 365),
src=c("1"="Atlanta", "2"="Chicago", "3"="Detroit")[src]
) %>%
group_by(src, doy) %>%
mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
group_by(src) %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(src, date, doy, contains("_r21")) %>%
ungroup()
df_r21_sd %>%
na.omit() %>%
group_by(src, doy) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(src, doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=src, color=src), lwd=1) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") +
theme(legend.position="bottom")